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ABSTRACT 

Linear prediction theory has had a profound impact in the field of digital signal processing. 
Although the theory dates back to the early 1940s, its influence can still be seen in applications 
today. The theory is based on very elegant mathematics and leads to many beautiful insights into 
statistical signal processing. Although prediction is only a part of the more general topics of linear 
estimation, filtering, and smoothing, this book focuses on linear prediction. This has enabled 
detailed discussion of a number of issues that are normally not found in texts. For example, the 
theory of vector linear prediction is explained in considerable detail and so is the theory of line 
spectral processes. This focus and its small size make the book different from many excellent texts 
which cover the topic, including a few that are actually dedicated to linear prediction. There are 
several examples and computer-based demonstrations of the theory. Applications are mentioned 
wherever appropriate, but the focus is not on the detailed development of these applications. The 
writing style is meant to be suitable for self-study as well as for classroom use at the senior and 
first-year graduate levels. The text is self-contained for readers with introductory exposure to signal 
processing, random processes, and the theory of matrices, and a historical perspective and detailed 
outline are given in the first chapter. 
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Preface 




Linear prediction theory has had a profound impact in the field of digital signal processing. 
Although the theory dates back to the early 1940s, its influence can still be seen in applications 
today. The theory is based on very elegant mathematics and leads to many beautiful insights into 
statistical signal processing. Although prediction is only a part of the more general topics of linear 
estimation, filtering, and smoothing, I have focused on linear prediction in this book. This has 
enabled me to discuss in detail a number of issues that are normally not found in texts. For example, 
the theory of vector linear prediction is explained in considerable detail and so is the theory of 
line spectral processes. This focus and its small size make the book different from many excellent 
texts that cover the topic, including a few that are actually dedicated to linear prediction. There are 
several examples and computer-based demonstrations of the theory. Applications are mentioned 
wherever appropriate, but the focus is not on the detailed development of these applications. 

The writing style is meant to be suitable for self-study as well as for classroom use at the 
senior and first-year graduate levels. Indeed, the material here emerged from classroom lectures that 
I had given over the years at the California Institute of Technology. So, the text is self-contained 
for readers with introductory exposure to signal processing, random processes, and the theory of 
matrices. A historical perspective and a detailed outline are given in Chapter 1. 
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CHAPTER 1 

Introduction 




Digital signal processing has influenced modern technology in many wonderful ways. During the 
course of signal processing history, several elegant theories have evolved on a variety of important 
topics. These theoretical underpinnings, of which we can be very proud, are certainly at the heart 
of the crucial contributions that signal processing has made to the modern technological society we 
live in. 

One of these is the theory of linear prediction. This theory, dating back to the 1940s, is 
fundamental to a number of signal processing applications. For example, it is at the center of many 
modern power spectrum estimation techniques. A variation of the theory arises in the identification 
of the direction of arrival of an electromagnetic wave, which is important in sensor networks, array 
processing, and radar. The theory has been successfully used for the representation, modeling, 
compression, and computer generation of speech waveforms. It has also given rise to the idea of 
line spectrum pairs, which are used in speech compression based on perceptual measures. More 
recently, vector versions of linear prediction theory have been applied for the problem of blind 
identification of noisy communication channels. 

Another product of the theory is a class of filtering structures called lattice structures. These 
have been found to be important in speech compression. The same structures find use in the design 
of robust adaptive digital filter structures. The infinite impulse response (IIR) version of the linear 
prediction lattice is identical to the well-known all-pass lattice structure that arises in digital filter 
theory. The lattice has been of interest because of its stability and robustness properties despite 
quantization. 

In this book, we give a detailed presentation of the theory of linear prediction and place in 
evidence some of the applications mentioned above. Before discussing the scope and outline, it is 
important to have a brief glimpse of the history of linear prediction. 

1.1 HISTORY OF LINEAR PREDICTION 

Historically, linear prediction theory can be traced back to the 1941 work of Kolmogorov, who 
considered the problem of extrapolation of discrete time random processes. Other early pioneers 
are Levinson (1947), Wiener (1949), and Weiner and Masani (1958), who showed how to extend 
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the ideas for the case of multivariate processes. One of Levinson's contributions, , which for some 
reason he regarded as "mathematically trivial," is still in wide use today (Levinson's recursion). For 
a very detailed scholarly review of the history of statistical filtering, we have to refer the reader to 
the classic article by Kailath (1974), wherein the history of linear estimation is traced back to its 
very roots. 

An influential early tutorial is the article by John Makhoul (1975) , which reviews the 
mathematics of linear prediction, Levinson's recursion, and so forth and makes the connection to 
spectrum estimation and the representation of speech signals. The connection to power spectrum 
estimation is studied in great detail in a number of early articles [Kay and Marple, 1981; Robinson, 
1982; Marple, 1987; Kay, 1988]. Connections to maximum entropy and spectrum estimation 
techniques can be found in Kay and Marple (1981), Papoulis (1981), and Robinson (1982). Articles 
that show the connection to direction of arrival and array processing include Schmidt (1979), 
Kumaresan (1983), and Paulraj et al. (1986). 

Pioneering work that explored the application in speech coding includes the work of Atal 
and Schroeder (1970) and that of Itakura and Saito (1970). Other excellent references for this 
are Makhoul (1975), Rabiner and Schafer (1978), Jayant and Noll (1984), Deller et al. (1993), 
and Schroeder (1999). Applications of multivariable models can be found even in the early image 
processing literature (e.g., see Chellappa and Kashyap, 1985). 

The connection to lattice structures was studied by Gray and Markel (1973) and Makhoul 
7 7), and their applications in adaptive filtering and channel equalization were studied by a 
number of authors (e.g., Satorius and Alexander, 1979). This application can be found in a number 
of books (e.g., Haykin, 2002; Sayed, 2003). All-pass lattice structures that arise in digital filter 
theory are explained in detail in standard signal processing texts (Vaidyanathan, 1993; Proakis and 
Manolakis, 1996; Oppenheim and Schafer, 1999; Mitra, 2001; Antoniou, 2006). 

Although the history of linear prediction can be traced back to the 1940s, it still finds new 
applications. This is the beauty of any solid mathematical theory. An example is the application of 
the vector version of linear prediction theory in blind identification of noisy finite impulse response 
(FIR) channels (Gorokhov and Loubaton, 1999; Lopez- Valcarce and Dasgupta, 2001). 

1.2 SCOPE AND OUTLINE 

Many signal processing books include a good discussion of linear prediciton theory, for example, 
Markel and Gray (1976), Rabiner and Schafer (1978), Therrien (1992), Deller et al. (1993), 
Anderson and Moore (1979), Kailath et al. (2000), Haykin (2002), and Sayed (2003). The book 
by Strobach (1990) is dedicated to an extensive discussion of linear prediction, and so is the early 
book by Markel and Gray (1976), which also discusses application to speech. The book by Therrien 
(1992) not only has a nice chapter on linear prediction, it also explains the applications in spectrum 
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estimation extensively. Another excellent book is the one by Kailath et al. (2000), which focuses on 
the larger topic of linear estimation. Connections to array processing are also covered by Therrien 
(1992) and Van Trees (2002). 

This short book focuses on the theory of linear prediction. The tight focus allows us to 
include under one cover details that are not normally found in other books. The style and emphasis 
here are different from the above references. Unlike most of the other references, we have included 
a thorough treatment of the vector case (multiple-input/multiple-output linear predictive coding, 
or MIMO LPC) in a separate chapter. Another novelty is the inclusion of a detailed chapter on the 
theory of line spectral processes. There are several examples and computer-based demonstrations 
throughout the book to enhance the theoretical ideas. Applications are briefly mentioned so that 
the reader can see the connections. The reader interested in these applications should peruse some 
of the references mentioned above and references in the individual chapters. 

Chapter 2 introduces the optimal linear prediction problem and develops basic equations for 
optimality, called the normal equations. A number of properties of the solution are also studied. 
Chapter 3 introduces Levinson's recursion, which is a fast procedure to solve the normal equations. 
This recursion places in evidence further properties of the solution. In Chapter 4, we develop lattice 
structures for linear prediction. Chapter 5 is dedicated to the topic of autoregressive modeling, 
which has applications in signal representation and compression. 

In Chapter 6, we develop the idea of flatness of a power spectrum and relate it to predictability 
of a process. Line spectral processes are discussed in detail in Chapter 7. One application is in the 
identification of sinusoids in noise, which is similar to the problem of identifying the direction of 
arrival of an electromagnetic wave. The chapter also discusses the theory of line spectrum pairs, 
which are used in speech compression. 

In Chapter 8, a detailed discussion of linear prediction for the MIMO case (case of vector 
processes) is presented. The MIMO lattice and its connection to paraunitary matrices is also 
explored in this chapter. The appendices include some review material on linear estimation as well 
as a few details pertaining to some of the proofs in the main text. 

A short list of homework problems covering most of the chapters is included at the end, 
before the bibliography section. 

1.2.1 Notations 

Boldface letters such as A and v indicate matrices and vectors. Superscript T,*, and fas in A , A*, 
and A denote, respectively, the transpose, conjugate, and transpose— conjugate of a matrix. The 
tilde notation on a function of z is defined as follows: 

H( Z )=H t (l/ z *) 
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Thus, 

H(z) = 5^h(fi)z - " =► H(z) = ^h f («)z n , 

n n 

so that the tilde notation effectively replaces all coefficients with the transpose conjugates and 
replaces z with 1/z. For example, 

H(z) = 6(0) + h(l)z~ l =►#(«) = #*(0) + A*(l)z 
and 

«o+«iZ _ k fx / N ao + a* z 

H(z) = r =$■ H (z) = — . 

W 1 + hz- 1 \+btz 

Note that the tilde notation reduces to transpose conjugation on the unit circle: 

H(e' ;w ) = HV'*0 

As mentioned in the preface, the text is self-contained for readers with introductory exposure to 
signal processing, random processes, and matrices. The determinant of a square matrix A is denoted 
as det A and the trace as Tr A. Given two Hermitian matrices A and B, the notation A > B means 
that A — B is positive semidefinite, and A > B means that A — B is positive definite. 



CHAPTER 2 




The Optimal Linear Prediction 

Problem 



2.1 INTRODUCTION 

In this chapter, we introduced the optimal linear prediction problem. We develop the equations for 
optimality and discuss some properties of the solution. 

2.2 PREDICTION ERROR AND PREDICTION POLYNOMIAL 

Let x(n) be a wide sense stationary (WSS) random process (Papoulis, 1965), possibly complex. 
Suppose we wish to predict the value of the sample x[n) using a linear combination of TV most 
recent past samples. The estimate has the form 

TV 

*7v(") = _ z2 a ^' X (" ~ ')■ ( 2 -^ 

1=1 

The integer N is called the prediction order. Notice the use of two subscripts for a, the first one 

being the prediction order. The superscript y on the left is a reminder that we are discussing the 

"forward" predictor, in contrast to the "backward predictor" to be introduced later. The estimation 

error is 

f f 

e J N (n)=x(n)-x J N (n), (2.2) 

that is, 

N 

e N (n) = x(n) + ^J «iv,<' x(n - i). (2.3) 




i=i 



We denote the mean squared error as £ 



iV 



£^E[\e f N (n)\\ (2.4) 



In view of WSS property, this is independent of time. The optimum predictor (i.e., the optimum 
set of coefficients a% ■_,-) is the one that minimizes this mean squared value. From Eq. (2.3), we see 
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x(n) 



A N (z) 
FIR prediction filter 



► 4(") 



4< n > 



(b) 



UA N (z) 

IIR inverse filter 



+ x(n) 



FIGURE 2.1: (a) The FIR prediction filter and (b) its inverse. 



that the prediction error elAn) can be regarded as the output of an FIR filter A^(z) in response to 
the WSS input x(n). See Fig. 2.1. The FIR filter transfer function is given by 



Mz) 



N 



(2.5) 



The IIR filter 1/Ajy(z) can therefore be used to reconstruct x(n) from the error signal e N (n) (Fig. 

2.1(b)). The conjugate sign on a^j in the definition (2.5) is for future convenience. Because e N (n) 

is the output of a filter in response to a WSS input, we see that e N (n) is itself a WSS random 

process. Aj^(z) is called the prediction polynomial, although its output is only the prediction error. 

Thus, linear prediction essentially converts the signal x(n) into the set of N numbers {#jv<} 
f f 

and the error signal e N (n). We will see later that the error e N (n) has a relatively flat power spectrum 

compared with x(n). For large N, the error is nearly white, and the spectral information of x(n) is 

mostly contained in the coefficient {«7v.;}. This fact is exploited in data compression applications. 

The technique of linear predictive coding (LPC) is the process of converting segments of a real time 

signal into the small set of numbers {fljv,;} for storage and transmission (Section 5.6). 



2.3 THE NORMAL EQUATIONS 

From Appendix A.l, we know that the optimal value of a^j should be such that the error elAn) is 
orthogonal to x(n — i), that is, 



ft 



E[e J N (n)x* (n - i)\ =0, 1 < i < N. 



(2.6) 



This condition gives rise to N equations similar to Eq. (A.13). The elements of the matrices R and 
r, defined in Eq. (A.13), now have a special form. Thus, 



[Rl, 



E\x(r 



i)x* (t 



*)], <i,m<N-l. 
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Define R(k) to be the autocorrelation sequence of the WSS process x{n), that is, 1 



R(k) =E[x{n)x*{n - k)}. 

Using the fact that R(k) = R* (—k), we can then simplify Eq. (A.13) to obtain 

R(0) R(l) ...R(N-1 

R*(l) R(0) ...R(N-2 

R*(N-1) R*(N-2) ... R(0) 



(2.7) 



)" 




a N.\ 




R*(l) 


) 




a N.2 


= — 


R*{2) 






a N,N 




R*(N) 



(2. 



For example, with iV = 3, we get 



21(0) R(l) R(2) ' 




«3,1 




R*(l) 


R*(l) R(0) R(l) 




«3,2 


= - 


R* (2) 


R*(2) R*(l) R(0) 




«3,3 




R* (3) 



(2.9) 



These equations have been known variously in the literature as normal equations, Yule— Walker 
equations, and Wiener— Hopf equations . We shall refer to them as normal equations. We can find a 
unique set of optimal predictor coefficients «jv.» as l° n g as the N X N matrix R^v is nonsingular. 
Singularity of this matrix will be analyzed in Section 2.4.2. Note that the matrix Rjv is Toeplitz, 
that is, all the elements on any line parallel to the main diagonal are identical. We will elaborate 
more on this later. 

Minimum-phase property. The optimal predictor polynomial A N (z) in Eq. (2.5), which 
is obtained from the solution to the normal equations, has a very interesting property. Namely, 
all its zeros, z^ satisfies \z^\ < 1. That is, Aj^(z) is a mimimum-phase polynomial. In fact, the 
zeros are strictly inside the unit circle \Zk\ < 1, unless x(n) is a very restricted process called a 
line spectral process (Section 2.4.2). The minimum-phase property guarantees that the IIR filter 
l/-4zv(z) is stable. The proof of the minimum-phase property follows automatically as a corollary 
of the so-called Levinson's recursion, which will be presented in Section 3.2. A more direct proof 
is given in Appendix C. □ 



A more appropriate notation would be R xx (k) , but we have omitted the subscript for simplicity. 
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2.3.1 Expression for the Minimized Mean Square Error 

We can use Eq. (A. 17) to arrive at the following expression for the minimized mean square error: 



N 



e£=R(o) + J2*fo R '(*)- 



Because £jy is real, we can conjugate this to obtain 



A' 



e£=R(0)+J2a Nti R(i). {2.10) 

r 

Next, because e N {n) is orthogonal to the past N samples x{n — k), it is also orthogonal to the 
estimate 'x N {n). Thus, from x{n) = x£/n) + e^{n), we obtain 

E[\ X (n)\ 2 }=E[\x( n )\ 2 }+E[\eM 2 }- {2.11) 



Thus the mean square value of x{n) is the sum of mean square values of the estimate and the 
estimation error. 

Example 2.1: Second-Order Optimal Predictor. Consider a real WSS process with auto- 
correlation sequence 

R(k) = (24/5) x 2~ w - (27/10) x 3" W . {2.12) 

The values of the first few coefficients of R{k) are 

R(0) = 2.1, R{1) = 1.5, R{2) = 0.9, . . . {2.13) 

The first-order predictor produces the estimate 

^c{{n) = —a\\x{n — 1), {2-14) 

and the optimal value of a\\ is obtained from R{§)a\\ = —R{1), that is, 

R{1) 5 

The optimal predictor polynomial is 

A x { z ) = 1 + a lA z~ l = 1 - (5/7)z _1 {2.15) 
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The minimized mean square error is, from Eq. (2.10), 

E[ = R(0) + a ltl R{l) = 36/35. (2.16) 

The second-order optimal predictor coefficients are obtained by solving the normal equations Eq. 
(2.8) with N= 2, that is, 



2.1 1.5 
1.5 2.1 



«2,1 
«2,2 



1.5 

0.9 



(2.17) 



The result is «2,1 = — 5/6 and «2,2 = 1/6. Thus, the optimal predictor polynomial is 

A 2 (z) = 1 - (S/6)z~ l + (l/6)z- 2 . (2.18) 

The minimized mean square error, computed from Eq. (2.10), is 



•/ 



El = R(0) + a 2A R(l) + a 2a R(2) = 1.0. 



(2.19) 



2.3.2 The Augmented Normal Equation 

We can augment the error information in Eq. (2.10) to the normal equations (2.8) simply by 
moving the right hand side in Eq. (2.8) to the left and adding an extra row at the top of the matrix 
Rjv- The result is 



R(0) R(l) 
R*(l) R(0) 

R*(tf) R*(N-1) 



■ RM 

.R(N- 1) 

■ R(o) 



l 

a N.\ 




£ 



f 



(2.20) 



This is called the augmented normal equation for the iVth-order optimal predictor and will be used 
in many of the following sections. We conclude this section with a couple of remarks about normal 
equations. 

1. Is any Toeplitz matrix an autocorrelation? Premultiplying both sides of the augmented 
equation by vector ajy, we obtain 



&N&N+1&N — & 



N- 



(2.21) 



Because Rjv+i is positive semidefinite, this gives a second verification of the (obvious) fact 
that £ N > 0. This observation, however, has independent importance. It can be used to 
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prove that any positive definite Toeplitz matrix is an autocorrelation of some WSS process 
(see Problem 14). 
2. From predictor coefficients to autocorrelations. Given the set of autocorrelation coefficients 
R(0) , i?(l) , . . . , R(N) , we can uniquely identify the N predictor coefficients ap/.i and the 
mean square error EL by solving the normal equations (assuming nonsingularity) and then 
using Eq. (2.10). Conversely, suppose we are given the solution a^j and the error £^. 
Then, we can work backward and uniquely identify the autocorrelation coefficients R(k), 
< k < N. This result, perhaps not obvious, is justified as part of the proof of Theorem 
5.2 later. 



2.4 PROPERTIES OF THE AUTOCORRELATION MATRIX 

The N X N autocorrelation matrix R^ of a WSS process can be written as 



R 



■N 



E[x(n)x'(n)) 



(2.22) 



where 



x(n) = x(n) x(n — 1) ... x(n — N + 1) 



iT 



(2.23) 



For example, the 3x3 matrix R3 in Eq. (2.9) is 



R; 



x(n) 
x(n — 1) 
x(n — 2) 



x* (n) x* (n — 1) x* (n — 2) 



This follows from the definition R(k) = E[x(n)x* (n — k)\ and from the property R(k) = R* (—k) . 
We first observe some of the simple properties of R^y: 

1. Positive defmiteness. Because x(n)x'(n) is Hermitian and positive semidefmite, R^y is also 
Hermitian and positive semidefmite. In fact, it is positive definite as long as it is nonsingular. 
Singularity is discussed in Section 2.4.2. 

2. Diagonal elements. The quantity R(0) appears on all the diagonal elements and is the mean 
square value of the random process, that is, R(0) = £[|*(»)| ]. 
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3. Toeplitz property . We observed earlier that R^v has the Toeplitz property, that is, the (k, m) 
element of R/v depends only on the difference m — k. To prove the Toeplitz property 
formally, simply observe that 

[R N ] km = E[x{n - k)x*{n - m)\ 
= E\x\n)x* (n — m + k)] 
= R(m- k). 

The Toeplitz property is a consequence of the WSS property of x(n). In Section 3.2, we 
will derive a fast procedure called the Levinson's recursion to solve the normal equations. 
This recursion is made possible because of the Toeplitz property of R^y. 

4. A filtering interpretation of eigenvalues. The eigenvalues of the Toeplitz matrix R^y can be 
given a nice interpretation. Consider an FIR filter 

V(z) = v$ + vfz~ 1 + ...+ v* N _ x z~ {N - l) (2.24) 

with input x(n). Its output can be expressed as 

y(n) = VQx(n) + v*x(n — 1) + . . . + v%_\ x(n — N + 1) = v'x(«), 

where 

v 1 = v Q v\ ... v N _ 1 

The mean square value of y(n) is i?[|v'x(w)| ] = v'Z?[x(m)x' (w)]v = v' Ra/v. Thus, 

E[\y(n)\ 2 ]=^R NV . (2.25) 

That is, given the Toeplitz autocorrelation matrix R^y, the quadratic form v'R^yv is the 
mean square value of the output of the FIR filter V(z), with input signal x(n). In particular, 
let v be an unit-norm eigenvector of R^v with eigenvalue A, that is, 

Rtvv = Av, v v = 1 . 

Then, 

v t R 7V v=A. (2.26) 

Thus, any eigenvalue of R^v can be regarded as the mean square value of the output y(n) for 
appropriate choice of the unit-energy filter V(z) driven by x(n). 

In the next few subsections, we study some of the deeper properties of the autocorrelation matrix. 
These will be found to be useful for future discussions. 
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2.4.1 Relation Between Eigenvalues and the Power Spectrum 

Suppose A;, < i < N — 1 are the eigenvalues of R^v- Because the matrix is positive semidefinite, 
we know that these are real and nonnegative. Now, let S xx (e /cu ) be the power spectrum of the 
process x(n), that is, 



*Jxx\^ J 



k— — oo 



R(k)e 



(2.27) 



We know S xx (e JUJ ) > 0. Now, let S mm and S max denote the extreme values of the power spectrum 
(Fig. 2.2). We will show that the eigenvalues A; are bounded as follows: 



< A,- < S„ 



(2.28) 



For example, if x(n) is zero-mean white, then ^(e JiJ ) is constant, and all the eigenvalues are equal 
(consistent with the fact that R^ is diagonal with all diagonal elements equal to R(0)). 

Proof of Eq. (2.28). Consider again a filter V(z) as in (2.24) with input x(n) and denote the 
output as y(n). Because the mean square value is the integral of the power spectrum, we have 

r-2jr 



Or* 



1 

2^ 



S„(e^)div 



1 f 27r 
— / ^(e^)|F(e^)| 2 do; 



<S„ 



2tt 



\v( 



J"\\2 



duJ 
2^ 



,v'v 



where the last equality follows from Parseval's theorem. Similarly, Zs[|jy(«)| 2 ] > SminvV. With v 
constrained to have unit norm (v'v =1), we then have 



<E\ 



2 ] < S a 



(2.29) 



A 


k 










§ 














A 


V 


S xx (e ) 




C ^\--' 


1 


All eigenva 
are here 

r 




O/MZW 














CO 










r 









In 





FIGURE 2.2: The relation between power spectrum and eigenvalues of the autocorrelation matrix. 
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poorly conditioned 



Sxx(e J 




FIGURE 2.3: Examples of power spectra that are well and poorly conditioned. 



But we have already shown that any eigenvalue of R^v can be regarded as the mean square value 
2s[[y(«) | 2 for appropriate choice of the unit-energy filter V(z) (see remark after Eq. (2.26)). Because 
(2.29) holds for every possible output y(n) with unit-energy V(z), Eq. (2.28), therefore, follows. □ 

With A min and A max denoting the extreme eigenvalues of R^v, the ratio 



AT 



A mi r\ 



> l 



(2. 30) 



is called the condition number of the matrix. It is well-known (Golub and Van Loan, 1989) that if 
this ratio is large, then numerical errors tend to have a more severe effect during matrix inversion 
(or when trying to solve Eq. (2.8)). It is also known that the condition number cannot decrease as 
the size N increases (Problem 12). If the condition number is close to unity, we say that the system 
of equations is well-conditioned. By comparison with Eq. (2.28), we see that 



l <M 



A, 



A, 



"min 



(2.31) 



Thus, if a random process has a nearly flat power spectrum (i.e., S ma x/ S mm ~ 1), it can be 
considered to be a well-conditioned process. If the power spectrum has a wide range, it is possible 
that the process is poorly conditioned (i.e., J\f could be very large). See Fig. 2.3 for demonstration. 
Also see Problem 11. 

2.4.2 Singularity of the Autocorrelation Matrix 

If the autocorrelation matrix is singular, then the corresponding random variables are linearly 
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x(n) ■ 



V(z) 
FIR 



-> y(n) = 

for all time 



FIGURE 2.4: The FIR filter V(z), which annihilates a line spectral process. 



dependent (Section A.4). To be more quantitative, let us assume that R i+ i [the (L + 1) x (L + 1) 
matrix] is singular. Then, proceeding as in Section A. 4, we conclude that 



vq x(n) + v* x(n — 1) + . . . + v£ x(n — L) = 0, 



(2.32) 



where not all v/s are zero. This means, in particular, that if we measure a set of L successive samples 
of x(n), then all the future samples can be computed recursively, with no error. That is, the process 
is fully predictable. 

Eq. (2.32) implies that if we pass the WSS random process x(n) through an FIR filter (see 
Fig. 2.4), 



V(z) = v§ +vfz~ 1 + ... + v£ 



(2.33) 



then the output y(n) is zero for all time! Its power spectrum ^(e-"*'), therefore, is zero for all to. 
Thus 



s yy (en = s xx (en\nen\=o 



(2.34) 



Because V(z) has at most L zeros on the unit circle (i.e., | V(e. JU) ) | has, at most, L distinct zeros in 
< lo < 27r), we conclude that S xx (e JU ) can be nonzero only at these points. That is, it has the 
form 



S xx (e juJ ) = 2tt ^2 c fia(u - Vi), < lo < 2tt, 



(2.35) 



i=i 



which is a linear combination of Dirac delta functions. This is demonstrated in Fig. 2.5. The 
autocorrelation of the process x(n), which is the inverse Fourier transform of S xx (c JU} ), then takes 
the form 

L 

R(k)=J2ae ju " A . (2.36) 



;=1 



A WSS process characterized by the power spectrum Eq. (2.35) (equivalently the autocorrelation 
(2.36)) is said to be a line spectral process, and the frequencies a;,- are called the line frequencies. 
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S xx (e ) 



Dirac delta 
functions 



t t 



+ 



CO 



2k 

FIGURE 2.5: Power spectrum of a line spectral process. 



Because the process is fully predictable, it is the exact "opposite" of a white process, which has no 
correlation between any pair of samples. In terms of frequency domain, for a white process, the 
power spectrum is constant, whereas for a fully predictable process, the power spectrum can only 
have impulses — it cannot have any smooth component. 

In Chapter 7, we present a more complete study of these and address the problem of 
identifying the parameters u>i and q when such a process is buried in noise. 

2.4.3 Determinant of the Autocorrelation Matrix 

We now show that the minimized mean square error Ej^ can be expressed directly in terms of the 
determinants of the autocorrelation matrices Rtv+i and R^v- This result is of considerable value in 
theoretical analysis. 

Consider the augmented normal equation Eq. (2.20) for the iVth-order predictor. We can 
further augment this equation to include the information about the lower-order predictors by 
appending more columns. To demonstrate, let N = 3. We then obtain four sets of augmented 
normal equations, one for each order, which can be written elegantly together as follows: 



R 4 x 



This uses the fact that R3 is a submatrix of R4, that is, 



1 


0" 




«3,1 1 







«3,2 «2,1 1 







a 3,3 «2,2 #1, 


I 1 . 





£3 ( X X X 



84 



X X 



£{ x 



/ 



£ 



/ 



(2.37) 



R 4 



R(0) x 
x R 3 



(2.38) 
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and similarly, R2 is a submatrix of R3 and so forth. The entries x are possibly nonzero but will not 
enter our discussion. Taking determinants, we arrive at 

det K.4 = u On £1 &n 1 

where we have used the fact that the determinant of a triangular matrix is equal to the product 
of its diagonal elements (Horn and Johnson, 1985). Extending this for arbitrary TV, we have the 
following result: 

detR J v 4 .i = £^ 1 ••• £ f - {2.39) 



In a similar manner, we have 



Taking the ratio, we arrive at 



•/ cf 



det R N = £ J N ^£ J N _ 2 . . . £J. (2.40) 



Thus, the minimized mean square prediction errors can be expressed directly in terms of the 
determinants of two autocorrelation matrices. In Section 6.5.2, we will further show that 

lim (detR 7V ) 1/7V = lim E f N (2.42) 

iV— ► 00 N— + 00 

That is, the limiting value of the mean squared prediction error is identical to the limiting value of 
(detR*,) 1 '". 

2.5 ESTIMATING THE AUTOCORRELATION 

In any practical application that involves linear prediction, the autocorrelation samples R(k) have 
to be estimated from (possibly noisy) measured data x(n) representing the random process. There 
are many methods for this, two of which are described here. 

The autocorrelation method. In this method, we define the truncated version of measured 
data 

. , f x(n) < n <L-1, 
xi\n) = < 

I outside, 

and compute its deterministic autocorrelation. Thus, the estimate of R(k) has the form 

L-\ 



R(k) = ^ x L (n)x£ (n-k). (2.43) 



n=0 
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There are variations of this method; for example, one could divide the summation by the number 
of terms in the sum (which depends on k) and so forth. From the estimated value R(i), we form 
the Toeplitz matrix Rjy and proceed with the computation of the predictor coefficients. 

A simple matrix interpretation of the estimation of R^v is useful. For example, if we have 
L = 5 and wish to estimate the 3x3 autocorrelation matrix R3, the computation (2.43) is 
equivalent to defining the data matrix 



X 



*(0) 








*(1) 


*(0) 





x(2) 


x(l) 


*(0) 


x(3) 


x(2) 


*(1) 


x(4) 


x(3) 


x{2) 





x(4) 


x(3) 








x{4) 



(2.44) 



and forming the estimate of R3 using 



Ri 



X f X)*. 



(2.45) 



This is a Toeplitz matrix whose top row is R(0), -R(l), • • • ■ Furthermore, it is positive definite by 
virtue of its form XX. So, all the properties from linear prediction theory continue to be satisfied 
by the polynomial A^(z). For example, we can use Levinson's recursion (Section 3.2) to solve for 
A N (z), and all zeros o£A N (z) are guaranteed to be in |z| < 1 (Appendix C). 

Note that the number of samples of x(n) used in the estimate of R(k) decreases as k increases, 
so the estimates are of good quality only if k is small compared with the number of available data 
samples L. There are variations of this method that use a tapering window on the given data instead 
of abruptly truncating it (Rabiner and Schafer, 1978; Therrien, 1992). 

The covariance method. In a variation called the "covariance method," the data matrix X is 
formed differently: 



X 



x(2) x(l) x(0) 
x(3) x(2) x(l) 
x(4) x(3) x(2) 



(2- 



and the autocorrelation estimated as R3 = (XX)*. If necessary we can divide each element of the 
estimate by a fixed integer so that this looks like a time average. 

In the covariance method, each R(k) is an average of N possibly nonzero samples of data, 
unlike the autocorrelation method, where the number of samples used in the estimate of R(k) 
decreases as k increases. So, the estimates, in general, tend to be better than in the autocorrelation 
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method, but the matrix Rjv in this case is not necessarily Toeplitz. So, Levinson's recursion 
(Section 3.2) cannot be applied for solving the normal equations, and we have to solve the normal 
equations directly. Another problem with non-Toeplitz covariances is that the solution A^{z) is 
not guaranteed to have all zeros inside the unit circle. 

More detailed discussions on the relative advantages and disadvantages of these methods 
can be found in many references (e.g., Makhoul, 1975; Rabiner and Schafer, 1978; Kay, 1988; 
Therrien, 1992). 

2.6 CONCLUDING REMARKS 

In this chapter, we introduced the linear prediction problem and discussed its solution. The solution 
appears in the form of a set of linear equations called the normal equations. An efficientway to solve 
the normal equations using a recursive procedure, called Levinson's recursion, will be introduced in 
the next chapter. This recursion will place in evidence a structure called the lattice structure for linear 
prediction. Deeper discussions on linear prediction will follow in later chapters. The extension of 
the optimal prediction problem for the case of vector processes, the MIMO LPC problem, will be 
considered in Chapter 8. 
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CHAPTER 3 




evinson's Recursion 




3.1 INTRODUCTION 

The optimal linear predictor coefficients a-^i are solutions to the set of normal equations given 
by Eq. (2.8). Traditional techniques to solve these equations require computations of the order of 
N (see, e.g., Golub and Van Loan, 1989). However, the N x N matrix R^ in these equations is 
not arbitrary, but Toeplitz. This property can be exploited to solve these equations in an efficient 
manner, requiring computations of the order of N . A recursive procedure for this, due to Levinson 
(1947), will be described in this chapter. This procedure, in addition to being efficient, also places 
in evidence many useful properties of the optimal predictor, as we shall see in the next several 
sections. 



3.2 DERIVATION OF LEVINSON'S RECURSION 

Levinson's recursion is based on the observation that if the solution to the predictor problem is 
known for order m, then the solution for order (m + 1) can be obtained by a simple updating 
process. In this way, we obtain not only the solution to the iVth-order problem, but also all the 
lower orders. To demonstrate the basic idea, consider the third-order prediction problem. The 
augmented normal Eq. (2.20) becomes 



R(0) R(l) R{2) R(3) 
R*{1) R{0) R(l) R{2) 
R*(2) R*(l) R(0) R(l 
R*(3) R*(2) R*(l) R(0) 





1 




~sf~\ 

°3 




#3,1 









«3,2 









_ a 33_ 




_ _ 



R4 



•■'; 



(3.1) 



This set of four equations describes the third-order optimal predictor. Our aim is to show how we 
can pass from the third-order to the fourth-order case. For this, note that we can append a fifth 
equation to the above set of four and write it as 
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1 




°3 




«3,1 









#3,2 


= 







#3,3 









_ _ 




_a 3 _ 



where 



/J(0) /?(1) /?(2) R(3) R(4) 
R*(l) R(0) R(l) R{2) R(3) 
R*(2) R*(l) R(0) R(l) R(2) 
R*(3) R*(2) R*(l) R{0) R(l) 
R*(4) R*(3) R*{2) R*(l) R(0) 



a 3 =R*(4) + « 3 ,i^*(3) + a 3a R%2) + a 3t3 R*(l). 



(3.2) 



(3.3) 



The matrix R5 above is Hermitian and Toeplitz. Using this, we verify that its elements satisfy 
(Problem 9) 



RZ- 



A-k 



Rii 



0<i,k<4. 



(3, 



In other words, if we reverse the order of all rows, then reverse the order of all columns and then 
conjugate the elements, the result is the same matrix! As a result, Eq. (3.2) also implies 



R(0) R(l) R(2) R(3) R(4) 
R*(l) R(0) R(l) R(2) R(3) 
R*(2) R*(l) R(0) R(l) R(2) 
R*(3) R*(2) R*(l) R(0) R(l) 
R*(4) R*(3) R*(2) R*(l) R(0) 










"a!" 




«3,3 









«3,2 


= 







«3.1 









1 




-°3 J 



(3.5) 



R 5 

If we now take a linear combination of Eqs. (3.2) and (3.5) such that the last element on the 
right-hand side becomes zero, we will obtain the equations governing the fourth-order predictor 
indeed! Thus, consider the operation 



Eq. (3.2) + kt x Eq. (3.5), 



where ^4 is a constant. If we choose 



-a 3 



£, 



f ' 



(3.6) 



(3.7) 
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then the result has the form 



R(0) R(l) R(2) R(3) R(4) 

R*(1)R(0) R(l) R(2) R(3) 

R*(2) R*(l) R{0) R{1) R{2) « 4 ,2 , (3.8) 

R*(3) R*(2) R*(l) R(0) R(l 

R*(4) R*(3) R*(2) R*(l) R(0) 

R 5 





1 




X 




«4,1 









«4,2 


= 







#4,3 









.#4,4. 




_0_ 



where 



#4,1 = #3,1 + ^4 #3,3 
#4,2 = #3,2 + k% #3,2 
#4,3 — #3,3 + k% #3,1 
#4 4 = ^4 . 

By comparison with Eq. (3.1), we conclude that the element denoted X on the right-hand side of 
Eq. (3.8) is £ 4 , which is the minimized forward prediction error for the fourth-order predictor. 
From the above construction, we see that this is related to Sj as 

■/_ cf 



£ 4 J = S{ + k% a* 3 



S{ - k A kt S{ (from Eq. (3.7)). 
(1 - \h\ 2 )S(. 



•/< 



Summarizing, if we know the coefficients #3 , and the mean square error Si for the third-order 
optimal predictor, we can find the corresponding quantities for the fourth-order predictor from the 
above equations. 

Polynomial Notation. The preceding computation of {#4,^} from {«3.^} can be written 
more compactly if we use polynomial notations and define the FIR filter 



(3.9) 



A,„(z) = 1 + a* m .\ z + a* m 2 z + . . . + a^ m z '" (prediction polynomial). 
With this notation, we can rewrite the computation of a^^ from a 3 j, as 

A A (z) =A 3 (z)+k 4 z- 1 [z- 3 A 3 (z)\, (3.10) 

where the tilde notation is as defined in Section 1,2. Thus, 

— m A ( v \ _i_ ry— 1 _i_ _i_ ry— {m— 1) I —in ( "7 1 l\ 
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3.2.1 Summary of Levinson's Recursion 

The recursion demonstrated above for the third-order case can be generalized easily to arbitrary 
predictor orders. Thus, letyf m (z) be the predictor polynomial for the mth-order optimal predictor 

r 

and E m the corresponding mean square value of the prediction error. Then, we can find the 
corresponding quantities for the (m + l)th-order optimal predictor as follows: 

-a* 



"m-\-l f 

fJ 

Of/! 



(3.12) 



where 



A m+1 (z) = A m (z) +k m+ iz 1 [z m A m (z)} (order update), (3.13) 



£' +1 = (1 - \k m+1 \ 2 )Sf, (error update), (3.14) 



a m = R*(m + 1) + a mA R\m) + a m . 2 R*(m - 1) + . . . + a m>m R*(l). (3.15) 



Initialization. Once this recursion is initialized for small m (e.g., m = 0), it can be used to 
solve the optimal predictor problem for any m. Note that for m = 0, the predictor polynomial is 

M*) = !> {3.16) 
and the error is, from Eq. (2.3), e§ («) = x(n). Thus, 

£ f =R(0). (3.17) 
Also, from the definition of a m , we have 

a =R*(l). (3.18) 
The above three equations are used to initialize Levinson's recursion. For example, we can compute 

h = -at /£ Q f =-R(l)/R(Q), (3.19) 

and evaluate A\(z) from Eq. (3.13) and so forth. □ 

3.2.2 The Partial Correlation Coefficient 

The quantity k t in Levinson's recursion has a nice "physical" significance. Recall from Section 2.3 
that the error eli(n) of the optimal predictor is orthogonal to the past samples 

x(n — 1) , x(n — 2) , . . . , x(n — m) . (3.20) 
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The "next older" sample x(n — m — 1), however, is not necessarily orthogonal to the error, that 
is, the correlation E\e m (n)x*(n — m — 1)] could be nonzero. It can be shown (Problem 6) that the 
quantity OLm in the numerator of k m+ \ satisfies the relation 



«,; 



E\e f m (n)x*(n - m - \)\. (3.21) 



The coefficient k m+ \ represents this correlation, normalized by the mean square error Eli- 
The correction to the prediction polynomial in the order-update equation (second term in 
Eq. (3.13)) is proportional to this normalized correlation. 

In the literature, the coefficients ft; have been called the partial correlation coefficients and 
abbreviated as parcor coefficients. They are also known as the lattice or reflection coefficients, for 
reasons that will become clear in later chapters. 

Example 3.1: Levinson's Recursion. Consider again the real WSS process with the auto- 
correlation (2.12). We initialize Levinson's recursion as described above, that is, 

A (z) = 1, a = R(l) = 1.5, and £ f = R(0) = 2.1. (3.22) 

We can now compute the quantities 

h = -a /£ f = -5/7 
A x {z) = A (z) +k 1 z- 1 A (z) = 1 - (5/7)z~\ 

e(= (\-k\)e(= 36/35. 

We have now obtained all information about the first-order predictor. To obtain the second-order 
predictor, we compute the appropriate quantities in the following order: 

«i = R(2) + a 1A R(l) = (9/10) - (5/7) x 1.5 = -6/35 
k 2 = -a l /£{=l/(^ 
A 2 (z) =A l (z)+k 2 z- 2 A l (z) = 1 - (5/6)z- : + (l/6)z" 2 
£{= (1 -%)£{= 1.0. 

We can proceed in this way to compute optimal predictors of any order. The above results agree 
with those of Example 2.1, where we used a direct approach to solve the normal equations. 

3.3 SIMPLE PROPERTIES OF LEVINSON'S RECURSION 

Inspection of Levinson's recursion reveals many interesting properties. Some of these are discussed 
below. Deeper properties will be studied in the next several sections. 

1. Computational complexity. The computation of A m+ i(z) from A m (z) requires nearly m 
multiplication and addition operations. As a result, the amount of computation required 
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for N repetitions (i.e., to find «jv,<) is proportional to N . This should be compared with 
traditional methods for solving Eq. (2.8) (such as Gaussian elimination), which require 
computations proportional to N . In addition to reducing the computation, Levinson's 
recursion also reveals the optimal solution to all the predictors of orders < TV. 

2. Parcor coefficients are bounded. Because Em is the mean square value of the prediction error, 
we know E, n > for any m. From Eq. (3.14), it, therefore, follows that |<£,-| 2 < 1, for any i. 

3. Strict bound on parcor coefficients. Let |^ m +i| = 1. Then Eq. (3.14) says that S^ +1 = 0. In 
other words, the mean square value of the prediction error sequence e m+l (n) is zero. This 
means that e^ +1 (n) is identically zero for all n. In other words, the output of A m+ i(z) in 
response to x[n) is zero (Fig. 2.1(a)). Using an argument similar to the one in Section 2.4.2, 
we conclude that this is not possible unless x(n) is a line spectral process. Summarizing, we 
have 

\k\ < 1, for any i, (3.23) 

unless x(n) is a line spectral process. Recall also that the assumption that x(n) is not 
line spectral also ensures that the autocorrelation matrix R/v in the normal equations is 
nonsingular for all N (Section 2.4.2). 

4. Minimized error is monotone. From Eq. (3.14) it also follows that the mean square error is a 
monotone function, that is, 

&U ^ £ L {3.24) 

with equality if and only if k m+ \ =0. It is interesting to note that repeated application 
of Eq. (3.14) yields the following expression for the mean square prediction error of the 
N th-order optimal predictor: 

2 Ui U__ |2\ (-i \L\2\cf 



£{,= {!- \k N \ 2 ) (i - |^-i| 2 ) ... (1 - N 2 )£, 



o 



[1 - |^| 2 ) (1 - |^v-i| 2 ) ... (1 - \h\ 2 )R{0) 

Example 3.2: Levinson's recursion. Now, consider a WSS process x(n) with power spectrum 
S xx (e JU ') as shown in Fig. 3.1 (top). The first few samples of the autocorrelation are 

R(0) R(l) R{2) R(3) R(4) R{5) R{6) 

0.1482 0.0500 0.0170 -0.0323 -0.0629 0.0035 -0.0087 
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FIGURE 3.1: Example 3.2. The input power spectrum S xx (e JUJ ) (top), the prediction error £„, plotted 
for a few prediction orders m (middle), and the prediction error £,{ shown for more orders (bottom). 
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If we perform Levinson's recursion with this, we obtain the parcor coefficients 

k\ k 2 k% k 4 k 5 k 6 

-0.3374 -0.0010 0.2901 0.3253 -0.3939 0.1908 

for the first six optimal predictors. Notice that these coefficients satisfy \k m \ < 1 as expected from 
theory. The corresponding prediction errors are 

pi pi pi pi pi pi pf 

°0 °1 °2 °3 °4 °S °6 

0.1482 0.1314 0.1314 0.1203 0.1076 0.0909 0.0876 

The prediction errors are also plotted in Fig. 3.1 (middle). The error decreases monotonically as the 
prediction order increases. The same trend continues for higher prediction orders as demonstrated 
in the bottom plot. The optimal prediction polynomials Aq(z) through A(,(z) have the coefficients 
given in the following table: 



4>(z) 


1.0 












A(z) 


1.0 


-0.3374 










M*) 


1.0 


-0.3370 


-0.0010 








M*) 


1.0 


-0.3373 


-0.0988 


0.2901 






A 4 {z) 


1.0 


-0.2429 


-0.1309 


0.1804 


0.3253 




M*) 


1.0 


-0.3711 


-0.2020 


0.2320 


0.4210 


-0.3939 


M*) 


1.0 


-0.4462 


-0.1217 


0.2762 


0.3825 


-0.4647 



0.1908 

The zeros of these polynomials can be verified to be inside the unit circle, as expected from our 
theoretical development (Appendix C). For example, the zeros of A(,(z) are complex conjugate 
pairs with magnitudes 0.9115, 0.7853, and 0.6102. 

3.4 THE WHITENING EFFECT 

Suppose we compute optimal linear predictors of increasing order using Levinson's recursion. We 
know the prediction error decreases with order, that is, £k+\ < £k- Assume that after the predictor 
has reached the order m, the error does not decrease any further, that is, suppose that 

££=£ f ..=£ f .~ = e f .*= ... (3.25) 

m m+1 m+2 m+1 \y.~-> j 

This represents a stalling condition, that is, increasing the numberof past samples does not help to 
increase the prediction accuracy any further. Whenever such a stalling occurs, it turns out that the 
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error e m (n) satisfies a very interesting property, namely, any pair of samples are mutually orthogonal. 
That is, 



f( n )[ e f(n + i)]* = \ ef \ T n (3.26) 




In particular, if x(n) has zero mean, then einin) will also have zero mean, and the preceding equation 
means that eL(n) is white. Now, from Section 2.2, we know that x(n) can be represented as the 
output of a filter, excited with e m (n) (see Fig. 2.1(b)). Thus, whenever stalling occurs, x(n) is the 
output of an all-pole filter 1/A,„(z) with white input; such a zero-mean process x(n) is said to be 
autoregressive (AR). We will present a more complete discussion of AR processes in Section 5.2. 
For the case of nonzero mean, Eq. (3.26) still holds, and we say that eln(n) is an orthogonal (rather 
than white) random process. 

Theorem 3.1. Stalling and whitening. Consider the optimal prediction of a WSS process 
x(n), with successively increasing predictor orders. Then, the iteration stalls after the wth-order 
predictor (i.e., the mean square error stays constant as shown by Eq. (3.25)), if and only if the 
prediction error e„,(n) satisfies the orthogonality condition Eq. (3.26). <0> 

Proof. Stalling implies, in particular, that £„ +1 = £m, that is, k m+ \ = (from Eq. (3.14)). 
As a result, A m+ \ (z) = A m (z). Repeating this, we see that 

A m (z) =A m+1 (z) =A m+2 {z) = . .. (3.27) 

The prediction error sequence e^iri), therefore, is the same for all £ > m. The condition k m+ i = 
implies a m = from Eq. (3.12). In other words, the cross-correlation Eq. (3.21) is zero. Repeating 
this argument, we see that whenever stalling occurs, we have 

E [ e l+M x *( n -m-£-l)} = 0, £>0. (3.28) 

f f 

But e m+t (n) = e m (n) for any £ > 0, so that 

E\el(n)x*(n - m - £ - 1)} = 0, £ > 0. (3.29) 

By orthogonality principle, we already know that E[e/„(n)x*(n — i)] = Oforl < i < m. Combining 
the preceding two equations, we conclude that 

E[el(n)x*(n -£)]=0, £>1. (3.30) 
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r 

e m {n) is a linear combination of present and past samples of x(n), that is, 



In other words, the error em{n) is orthogonal to all the past samples of x{n). We also know that 

present and past samples of x{n), that is, 

m 
e m( n ) ~ X ( n ) + 2^ a mJ X ( n ~ 0- {3.31) 



Similarly, 



£ m( n — ^) = x ( w — ^) + 7, a »»,i *( w — ^ — 0- {3.32) 

i=i 

Because e m {n) is orthogonal to all the past samples of #(«), we, therefore, conclude that e m {ii) 

f f f 

is orthogonal to e m {n — £),£> 0. Summarizing, we have proved, E\e m {n)\e m {n — £)]*] = for 

£ > 0. This proves Eq. (3.26) indeed. By reversing the above argument, we can show that if e m {n) 

has the property (3.26) then the recursion stalls (i.e., Eq. (3.25) holds). □ 

Example 3.3: Stalling and Whitening. Consider a real WSS process with autocorrelation 

R{k)=p\\ {3.33) 

where —1 < p < 1. The first-order predictor coefficient an is obtained by solving 

R{0)a u = -R{1), {3.34) 

so that #11 = —p. Thus, the optimal predictor polynomial is A\{z) = 1 — pz~ . To compute the 
second-order predictor, we first evaluate 

e»i = R{2) + a 1A R{l) = p 2 - p 2 = 0. {3.35) 

Using Levinson's recursion, we find ^2 = — a * I &\ = 0> so that ^z{ z ) = ^i(z). To find the 
third-order predictor, note that 

a 2 = R{3) + a 2A R{2) + a 2a R{l) = R{3) - pR{2) = 0. {3.36) 

Thus, kj, = and Aj,{z) = A\{z). No matter how far we continue, we will find in this case that 
A m {z) = Ai{z) for all m. That is, the recursion has stalled. Let us double-check this by testing 
whether e{ («) satisfies Eq. (3.26). Because e^{n) is the output of ^i(z) in response to x{n), we 
have 

e({n) = x{n) — px{n — 1). {3.37) 
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Thus, for k > 0, 

E[e{{n)e{(n - k)\ = R{k) + p 2 R{k) - pR{k - 1) - pR(k + 1) = 0, (3.38) 

as anticipated. Because R(oo) = 0, the process x(n) has zero mean. So, e((n) is a zero-mean white 
process. 

3.5 CONCLUDING REMARKS 

Levinson's work was done in 1947 and, in his own words, was a "mathematically trivial procedure." 
However, it is clear from this chapter that Levinson's recursion is very elegant and insightful. It can 
be related to early work by other famous authors (e.g., Chandrasekhar, 1947). It is also related to 
the Berlekamp— Massey algorithm (Berlekamp, 1969). For a fascinating history, the reader should 
study the scholarly review by Kaliath (1974; in particular, see p. 160). The derivation of Levinson's 
recursion in this chapter used the properties of the autocorrelation matrix. However, the method 
can be extended to the case of Toeplitz matrices, which are not necessarily positive definite (Blahut, 
1985). 

In fact, even the Toeplitz structure is not necessary if the goal is to obtain an 0(N ) algorithm. 
In 1979, Kailath et al. introduced the idea of displacement rank for matrices. They showed that as 
long as the displacement rank is a fixed number independent of matrix size, 0(N ) algorithms can 
be found for solving linear equations involving these matrices. It turns out that Toeplitz matrices 
have displacement rank 2, regardless of the matrix size. The same is true for inverses of Toeplitz 
matrices (which are not necessarily Toeplitz). 

One of the outcomes of Levinson's recursion is that it gives rise to an elegant structure for 
linear prediction called the lattice structure. This will be the topic of discussion for the next chapter. 
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CHAPTER 4 

Lattice Structures for Linear 
Prediction 



4.1 INTRODUCTION 

In this chapter, we present lattice structures for linear prediction. These structures essentially 
follow from Levinson's recursion. Lattice structures have fundamental importance not only in linear 
prediction theory but, more generally, in signal processing. For example, they arise in the theory 
of all-pass filters: any stable rational all-pass filter can be represented using an IIR lattice structure 
similar to the IIR LPC lattice. As a preparation for the main topic, we first discuss the idea of 
backward linear prediction. 

4.2 THE BACKWARD PREDICTOR 

Let x{n) represent a WSS random process as usual. A backward linear predictor for this process 
estimates the sample x[n — N — 1) based on the "future values" x[n — 1), ... , x[n — TV). The 
predicted value is a linear combination of the form 

N 

x h N {n-N-l) = -^2^ i x{n-i), (4.1) 



and the prediction error is 



L N 



(n) =x(n-N-l) -x N (n-N- 1) 



N 



= > b%^x{n — i) + x(n — N — 1). 



i=i 



The superscript b is meant to be a reminder of "backward." Figure 4.1 demonstrates the difference 
between the forward and backward predictors. Notice that both x N (n) and x N (n — N — 1) are based 
on the same set of N measurements. The backward prediction problem is primarily of theoretical 
interest, but it helps us to attach a "physical" significance to the polynomial z~ m A m [z) in Levinson's 
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observed data 



backward predictor 
estimates this 





forward predictor 
estimates this 



time 
— ► 



n—N—1 n-N n — 1 n 

FIGURE 4.1: Comparison of the forward and backward predictors. 



recursion (3.13), as we shall see. As in the forward predictor problem, we again define the predictor 
polynomial 



A' 



B N (z) = ^2^ i z- i + z 



(7V+1) 



(4.2) 



r-l 



The output of this FIR filter in response to x(n) is equal to the predictor error e N (n). 

To find the optimal predictor coefficients b%j , we again apply the orthogonality principle, 
which says that ej^(n) must be orthogonal to x(n — k), for 1 < k < N. Using this, we can show that 
the solution is closely related to that of the forward predictor of Section 2.3. With a N j denoting 
the optimal forward predictor coefficients, it can be shown (Problem 15) that 



f>N,i = a *N.N+l-i, 1 < i < N, 



(4.3) 



This equation represents time-reversal and conjugation. Thus, the optimum backward predictor 
polynomial is given by 



B N (z)=z-( N+1 U N (z), 



(4.4) 



where the tilde notation is as defined in Section 1.2.1. Figure 4.2 summarizes how the forward and 
backward prediction errors are derived from x(n) by using the two FIR filters, A^(z) and B^(z). It 
is therefore clear that we can derive the backward prediction error sequence fy(n) from the forward 
prediction error sequence eiXn) as shown in Fig. 4.3. In view of Eq. (4.4), we have 



B N (z) = z-^A N (z) 
A N (z) A N (z) 



(4.5) 
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-> 4< n ) 



B N (z) 

backward prediction filter 



■> <w 



FIGURE 4.2: The forward and the backward prediction filters. 



4.2.1 All-Pass Property 

We now show that the function 



G N (i 






is all-pass, that is, 

To see this, simply observe that 

G N (z)G N (z) 
But because 



\G N (e J ")\ = l, Vw. 



z»A N (z) z-»M*) 1 w 
X ^-r~, — = 1, V Z 



A N (z) 4w(z) 



(4.6) 



(4.7) 



H N {^)=HN{t Ju ), 

the preceding implies that |G7v(e yw )| = 1, which proves Eq. (4.7). From Fig. 4.3, we therefore 
see that the backward error e N (n) is the output of the all-pass filter z~ G^(z) in response to the 
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FIGURE 4.3: The backward prediction error sequence, derived from the forward prediction error 
sequence. 
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input e N (n), which is the forward error. The power spectrum of ejj{n) is therefore identical to that 
of e h N (n). In particular, therefore, the mean square values of the two errors are identical, that is, 

where S^ = £[|^(«)| 2 ] and £^ = E[\e^(n)\ 2 ]. Notice that the all-pass filter (4.6) is stable because 
the zeros of A^(z) are inside the unit circle (Appendix C). 

4.2.2 Orthogonality of the Optimal Prediction Errors 

Consider the set of backward prediction error sequences of various orders, that is, 



e h (n), e\(n), e\{n),... 



(4.9) 



We will show that any two of these are orthogonal. More precisely, 

Theorem 4.1. Orthogonality of errors. The backward prediction error sequences satisfy the 
property 



,(»)[«* (»)F 



for k ^ m 
6°, iotk = m. 



(4.10) 



foralU, m > 0. 

Proof. It is sufficient to prove this for m > k. According to the orthogonality principle, e m (n) 



orthogonal to 



From its definition, 



x(n — 1) , . . . , x(n — m) . \4-H) 



e i( n ) = x ( n ~ & ~ 1) + bk.\ x ( n ~ 1) + bi,2 x ( n ~ 2) + ••• -\-bk,k x \ n ~ &)■ 

From these two equations, we conclude that E[e m (n) [e k (n)\*\ = 0, for m > k. From this result, Eq. 
(4.10) follows immediately. □ 

In a similar manner, it can be shown (Problem 16) that the forward predictor error sequences 
have the following orthogonality property: 



J(n)\e{(n-\)X 



0, m > 



.12) 



The above orthogonality properties have applications in adaptive filtering, specifically in improving 
the convergence of adaptive filters (Satorius and Alexander, 1979; Haykin, 2002). The process 
of generating the above set of orthogonal signals from the random process x(n) has also been 
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interpreted as a kind of Gram— Schmidt orthogonalization (Haykin, 2002). These details are beyond 
scope of this chapter, but the above references provide further details and bibliography. 

4.3 LATTICE STRUCTURES 

In Section 3.2, we presented Levinson's recursion, which computes the optimal predictor polynomial 
A m+ \(z) fromyf m (z) according to 

A m+1 (z) = A m (z) + k m+l z- l [z~ m A m (z)} (4.13) 

On the other hand, we know from Section 4.2, that the optimal backward predictor polynomial is 
given by Eq. (4.4). We can therefore rewrite Levinson's order update equation as 



From this, we also obtain 



A m+ \{z) =A m (z) + k m+1 B m (z). 



B m+1 (z) = z [k* m+1 A m (z) +B m (z)], 



.14) 



(4.15) 



by using B m+ \ (z) = z~ < -~ m+2 ^A m+ i(z). The preceding relations can be schematically represented as 
in Fig. 4.4(a). Because e m (n) and e m (n), are the outputs of the filters A m (z) and B m (z) in response 
to the common input x(n), the prediction error signals are related as shown in Fig. 4.4(b). By 



A m (z) 



B m (z) 




► A m+ i(z) 



(a) 




► B m + \ ( z ) 



► <+l<») 



(b) 



►Cm^ 



FIGURE 4.4: (a) The lattice section that generates the (in + l)th-order predictor polynomials from 
the wth-order predictor polynomials, (b) The same system shown with error signals indicated as the 
inputs and outputs. 
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FIGURE 4.5: (a) The FIR LPC lattice structure that generates the prediction errors eL(ii) and e m {n 
for all optimal predictors with order 1 < m < N. (b) Details of the mth lattice section. 



using the facts that Aq(z) = 1 and Bq(z) = z~ , we obtain the structure of Fig. 4.5(a). Here, 
each rectangular box is the two multiplier lattice section shown in Fig. 4.5(b). In this structure, 
the transfer functions A„,(z) and B m (z) are generated by repeated use of the lattice sections in 
Fig. 4.4. This is a cascaded FIR lattice structure, often referred to as the LPC FIR lattice. If we 
apply the input x(n) to the FIR lattice structure, the optimal forward and backward prediction 
error sequences e m {n) and e m {n) appear at various nodes as indicated. Lattice structures for linear 
prediction have been studied in detail by Makhoul (1977). 

4.3.1 The HR LPC Lattice 

From Fig. 4.4(b), we see that the prediction errors are related as 



e m+ 



i(") =e£(») +K+\e h m {n), 



c m+l 



(») 



<m+\ 



-Li"- 1 ) +^("- 1 )- 



Suppose we rearrange the first equation as eh{n) = e m+ \ { n ) ~ k m +\e m (n) , then these two equations 
become 



D m+1 



-m( n ) = e {,+i( n ) ~ k m +\e h m (n), 
(n) = k* m+x el(n - 1) + e h m (n - 1). 



These equations have the structural representation shown in Fig. 4.6(a). Repeated application of 
this gives rise to the IIR structure of Fig. 4.6(b). This is precisely the IIR all-pass lattice structure 
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FIGURE 4.6: (a) The IIR LPC lattice section, (b) The IIR LPC lattice structure that generates the 
signal x(n) from the optimal prediction error e^(n). The lower-order forward prediction errors e£,(n) 
and the backward prediction errors are also generated in the process automatically. 



studied in signal processing literature (Gray and Markel, 1973; Vaidyanathan, 1993; Oppenheim 
and Schafer, 1999). Thus, if we apply the forward prediction error e N (n) as an input to this structure, 
then the forward and backward error sequences of all the lower-order optimal predictors appear at 
various internal nodes, as indicated. In particular, the Oth-order prediction error e^ («) appears at 
the rightmost node. Because e^ (n) = x{n), we therefore obtain the original random process x{n) 
as an output of the IIR lattice, in response to the input e N (n). We already showed that the transfer 
function 



-l 



G N (z) = Z -(^^M 
A N (z) 



is all-pass. This is the transfer function from the input eiin) to the node e° N (n) in the IIR lattice. 
The transfer function from the input to the rightmost node in Fig. 4.6(b) is the all-pole filter 
\/An(z), because this transfer function produces x(n) in response to the input e N (ti) (see Fig. 4.2). 

4.3.2 Stability of the IIR Filter 

It is clear from these developments that the "parcor" coefficients k m derived from Levinson's 
recursion are also the lattice coefficients for the all-pass function z A-^(z) j A-^(z) . Now, it is well 
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known from the signal processing literature (Gray and Markel, 1973; Vaidyanathan, 1993) that the 
polynomial A?j{z) has all its zeros strictly inside the unit circle if and only if 

\k m \ < 1, 1 < m <N. (4.16) 

In fact, under this condition, all the polynomials A m (z), 1 < m < N have all zeros strictly inside 
the unit circle. 

On the other hand, Levinson's recursion (Section 3.2) has shown that the condition \k m \ < 1 
is indeed true as long as x(n) is not a line spectral process. This proves that the IIR filters 1/A m (z) 
and z~ m A m (z) /A m (z) indeed have all poles strictly inside the unit circle so that Fig. 4.6(b) is, in 
particular, stable. Although the connection to the lattice structure is insightful, it is also possible 
to prove stability of \/A m (z) directly without the use of Levinson's recursion or the lattice. This is 
done in Appendix C. 

4.3.3 The Upward and Downward Recursions 

Levinson's recursion (Eq. (4.14) and (4.15)) computes the optimal predictor polynomials of 
increasing order, from a knowledge of the autocorrelation R(k) of the WSS process x(n). This is 
called the upward recursion. These equations can be inverted to obtain the relations 

(1 - |4»+i| 2 ) x A,„(z) =A m+1 (z) - k m+1 zB m+1 (z), 
(1 - \k m +i\ 2 ) x B m (z) = -k* m+1 A m+1 (z) + zB m+1 (z). {4.17) 

Given A m +i{z), the polynomial B m+ \(z) is known, and so is 

k m +\ = a m +i,m+i- {4-18) 

Using this, we can uniquely identify A m (z), hence, B m (z), from (4.17). Thus, if we know the 
A^th-order optimal prediction polynomial Aj^{z), we can uniquely identify all the lower-order 
optimal predictors. So Eq. (4.17) is called the downward recursion. This process automatically 
reveals the lower-order lattice coefficients ki. If we know the prediction error Sjy, we can therefore 
also identify the lower-order errors £f, by repeated use of Eq. (3.14), that is, 

£l=£,{ +1 /(l-\k m+1 \ 2 ) {4.19) 
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4.4 CONCLUDING REMARKS 

The properties of the IIR lattice have been analyzed in depth by Gray and Markel (1973), and lattice 
structures for linear prediction have been discussed in detail by Makhoul (1977). The advantage of 
lattice structures arises from the fact that even when the coefficients k m are quantized the IIR lattice 
remains stable, as long as the quantized numbers continue to satisfy \k m \ < 1. This is a very useful 
result in compression and coding, because it guarantees stable reconstruction of the data segment 
that was compressed. We will return to this point in Sections 5.6 and 7.8. 
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CHAPTER 5 

utoregressive Modeling 



5.1 INTRODUCTION 

Linear predictive coding of a random process reveals a model for the process, called the autoregressive 
(AR) model. This model is very useful both conceptually and for approximating the process with a 
simple model. In this chapter, we describe this model. 

5.2 AUTOREGRESSIVE PROCESSES 

A WSS random process wyn) is said to be autoregressive (AR) if it can be generated by using the 
recursive difference equation 

N 



j(n) = — y d*w(n — i) + e(n), (5--?) 



;=i 



where 



is a zero-mean white WSS process, and 



,A r 



2. the polynomial -D(z) = 1 + Xw=i ^> z ' nas a ^ zer os inside the unit circle. 

In other words, we can generate w(n) as the output of a causal, stable, all-pole IIR filter 1/D(z) in 
response to white input (Fig. 5.1). If d^ 7^ 0, we say that the process is AR(N) , that is, AR of order 
N. Because e{n) has zero mean, the AR process has zero mean according to the above definition. 

Now, how does the AR process enter our discussion of linear prediction? Given a WSS 
process x(n), let us assume that we have found the iVth-order optimal predictor polynomial An(z). 
We know, we can then represent the process as the output of an IIR filter as shown in Fig. 2.1(b). 
The input to this filter is the prediction error e^(n). In the time domain, we can write 

N 



J2 a h x ( n ~ + e iv(") • i 5 - 2 ) 



;=1 



42 THE THEORY OF LINEAR PREDICTION 



e(n) 



white process 



\/D(z) 
MR allpole filter 



+ w(n) 



AR process 



FIGURE 5.1: Generation of an AR process from white noise and an all-pole IIR filter. 



In Section 3.4, we saw that if the error stalls, that is, Em does not decrease anymore as m increases 
beyond some value N, then e N (n) is white (assuming x(n) has zero mean). Thus, the stalling 
phenomenon implies that x{n) is AR(N\ 

Summarizing, suppose the optimal predictors of various orders for a zero-mean process x{n) 
are such that the minimized mean square errors satisfy 

S{ >£/> ... > 4-1 >£n = £l m > N - {5.3) 

Then, x(n)is AR(N). 

Example 5.1: Levinson's recursion for an AR process. Consider a WSS AR process 
generated by using an all-pole filter c/D(z) with complex conjugate pairs of poles at 

0.8e ±a2 ' /7r ,0.85e ±0 - 4/7r 

so that 

D{z) = 1.0000 - 1.8198Z" 1 +2.0425z" 2 - 1.2714z" 3 + 0.4624z~ 4 . 

The power spectrum S ww (e.^) of this AR(4) process is shown in Fig. 5.2 (top plot). The constant 
c in c/D{z) is such that S ww (e JU} ) has maximum magnitude unity. The first few autocorrelation 
coefficients for this process are as follows: 

R(0) R(l) R{2) R{3) R(4) R{5) 

0.2716 0.1758 -0.0077 -0.1091 -0.0848 -0.0226 

If we perform Levinson's recursion with this, we obtain the lattice coefficients 



*i n *3 *4 us *6 

-0.6473 0.7701 -0.5469 0.4624 0.0000 0.0000 
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FIGURE 5.2: Example 5.1. The input power spectrum S lmo (e JUJ ) of an AR(4) process (top) and the 
prediction error £,{, plotted for a few prediction orders m (middle). The prediction error for the non-AR 
process of Ex. 3.2 is also reproduced (bottom). 
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for the first six optimal predictors. Notice that these coefficients satisfy \k m \ < 1 as expected from 
theory. The optimal prediction polynomials Aq{z) through A(,(z) have the coefficients given in the 
following table: 



A(z) 


1.0 














A r (z) 


1.0 


-0.6473 

















M*) 


1.0 


-1.1457 


0.7701 














M*) 


1.0 


-1.5669 


1.3967 


-0.5469 











A 4 (z) 


1.0 


-1.8198 


2.0425 


-1.2714 


0.4624 








A 5 (z) 


1.0 


-1.8198 


2.0425 


-1.2714 


0.4624 








A 6 {z) 


1.0 


-1.8198 


2.0425 


-1.2714 


0.4624 









The prediction filter does not change after^4(z) because the original process is AR(4). This is also 
consistent with the fact that the lattice coefficients past k 4 are all zero. The prediction errors are 
plotted in Fig. 5.2 (middle). The error £ m decreases monotonically and then becomes a constant for 
m > 4. For comparison, the prediction error £ m for the non-AR process of Ex. 3.2 is also shown 
in the figure (bottom); this error continues to decrease. 

5.3 APPROXIMATION BY AN AR(N) PROCESSES 

Recall that an arbitrary WSS process x{n) (AR or otherwise), can always be represented in terms 



of the optimal linear prediction error e N {n) as in Eq. (5.2). In many practical cases, the error e N {n) 
tends to be nearly white for reasonably large N. If we replace e^in) with zero-mean white noise 
e(n) having the mean square value E^, we obtain the so-called AR model y(n) for the process x(n). 



(a) 



4w 



UA N (z) 



MR allpole filter 



+ x(n) 

original 
process 



e(n) 

'") white process 



UA N (z) 



+ y(n) 



IIR allpole filter 



AR approximation 



FIGURE 5.3: Generation of the AR(N) approximation y(n) for a WSS process x(n). (a) The exact 
process x{n) expressed in terms of the TVth-order LPC parameters and (b) the ARiN) approximation 
y{n). 
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This is shown in Fig. 5.3. The process y(n) satisfies 

TV 

y( n ) = — /] «N,iy{n - i) + e(n) (5.4) 

white 

The model process y(n) is also called the AR(N) approximation (iVth-order autoregressive 
approximation) of x(n). 

5.3.1 If a Process is AR, Then LPC Will Reveal It 

If a zero-mean WSS process x(n) is such that the optimal LPC error ej^(n) is white, then we know 
that x(n) is AR. Now, consider the converse. That is, let x(n) be AR(N). This means that we can 
express it as 

TV 

x(n) = — 2, c N.i x ( n ~ + e ( w )> {5-5) 

<=l 

for some set of coefficients CN,i, where e(n) is white. This is a causal difference equation, so that, at 
any time n, the sample x(n) is a linear combination 

x(n) =g(0)e(n) +g(l)e(n-l) +g(2)e(n-2) + ... (5.6) 

Similarly, x(n — i) is a linear combination of 

e(n — i) , e(n — i — 1) . . . 

Now consider 

E[e(n)x*(n — z)] = E e(n)[g(Qi)e(n — i) + g(l)e(n — i — 1) + . . .] 

Because e(n) is white (i.e., E\e(n)e [n — £)] = 0, £ > 0), this means that 

E[e(n)x*(n - i)] =0, i > 0. (5. 7) 

In other words, e(n) is orthogonal to all the past samples of x[n). In view of the orthogonality 
principle, this means that the linear combination 

TV 
<=1 

is the optimal linear prediction of x(n). As the solution to the optimal iVth-order prediction 
problem is unique, the solution {a^i\ to the optimal predictor problem is therefore equal to c^j. 
Thus, the coefficients ov,; of the AR(N) process can be identified simply by solving the iVth-order 
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optimal prediction problem. The resulting prediction error e N (n) is white and equal to the "input 
term" e{n) in the AR equation Eq. (5.5). Summarizing, we have: 

Theorem 5. \. LPC of an ARprocess . Letx(w) be znARiN) WSS process, that is, a zero-mean 
process representable as in Eq. (5.5), where e\n) is white. Then, 

1. The AR coefficients cfy; can be found simply by performing TVth-order LPC on the process 
x(n). The optimal LPC coefficients a^ •,• will turn out to be cjv,\ Thus, the AR model y{n) 
resulting from the optimal LPC (Fig. 5.3) is the same as x(n), that is y(n) = x(n). 

2. Moreover, the prediction error e^(n) is white and the same as the white noise sequence e(n) 
appearing in the AR(N) description Eq. (5.5) of x{n). <0> 

5.3.2 Extrapolation of the Autocorrelation 

Because e N [n) is white, the autocorrelation R(i) of an AR(N) process x[n) is completely 
determined by the filter 1/Aj^(z) and the variance £jy. Because the quantities Aj^(z) and £jy can be 
found from the A^ + 1 coefficients 

R(0),R(1),...,R(N) 

using Levinson's recursion, we conclude that for AR(N) processes, all the coefficients R(k) , \k\ > N 
are determined by the above N -\- 1 coefficients. So, the determination of A^r(z) and £^ provides 
us a way to extrapolate an autocorrelation. In other words, we can find R(k) for \k\ > Af from the 
first iV + 1 values in such a way that the Fourier transform S xx (e JUJ ) of the extrapolated sequence is 
nonnegativefor all uo. 

Extrapolation equation. An explicit equation for extrapolation of R{k) can readily be written 
down. Thus, given R(0), R{^), • • .R(N) for an AR(N) process, we can find R(k) for any k > N 
using the equation 

R(k) = -a* N1 R(k - 1) - a* N2 R{k - 2) ... - a* NN R(k - N). 

This equation follows directly from normal equations. See Appendix D for details. □ 

In many practical situations, a signal x{n) that is not AR can be modeled well by an AR 
process. A common example is speech (Markel and Gray, 1976; Rabiner and Schafer, 1978; Jayant 
and Noll, 1984). The fact that x(n) is not AR means that the prediction error e m (n) never gets 
white, but its power spectrum often gets flatter and flatter as m increases. This will be demonstrated 
later in Section 6.3 where we define a mathematical measure for spectral flatness and study it in the 
context of linear prediction. 
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5.4 AUTOCORRELATION MATCHING PROPERTY 

Let x(n) be a WSS process, Aj^yz) be its optimal iVth-order predictor polynomial, and y(n) be 
the AR(N) approximation of x(n) generated as in Fig. 5.3. In what mathematical sense does y(n) 
approximate x(n), that is, in what respect is y(n) "close" to x(n)} We now provide a quantitative 
answer: 

Theorem 5.2. Let R(i) and r(A) be the autocorrelations of x(n) and the AR(N) approxima- 
tion y(ri), respectively. Then, 

R(k) =r(k), \k\ <N. {5.8) 

Thus, the AR(N) model y(n) is an approximation of x(n) in the sense that the first N-\-l 
autocorrelation coefficients for the two processes are equal to each other. (} 

So, as the approximation order N increases, more and more values of R(k) are matched to 
r(k). If the process x(n) is itself AR(N), then we know (Theorem 5.1) that the AR(N) model 
y(n) satisfies y(n) = x(n) so that Eq. (5.8) holds for all k, not just \k\ < N. A simple corollary of 
Theorem 5.2 is 

R(0) = r(0), 

which can be used to prove that for any WSS process x(n), 



m = £ «['vi^WT, < 5 - 



To see this, observe first that the AR process y{n) in Fig. 5.3(b) is generated using white noise e{n) 
with variance £^, so that 

{> J K(e^)| 2 27r 

But because r(0) = R(0), Eq. (5.9) follows readily. 

Proof of Theorem 5.2. By construction, the AR(N) process y{n) is representable as in 
Fig. 5.3(b) and therefore satisfies the difference equation Eq. (5.4). Now consider the sample 
y(n — k). Because Fig. 5.3(b) is a causal system, 

y{n — k) = linear combination of e{n — k — i), i > 0. (5.10) 

But because e(n) is zero-mean white, we have E\e(n)e*\n — k)] = for k ^ 0. Thus, 

E[e(n)y*(n - k)} = 0, k>0. (5.11) 
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Multiplying Eq. (5.4) by_y*(w — /£), taking expectations, and using the above equality, we obtain 
the following set of equations. 



(5.12) 



where r(k) = E[y(n)y*(n — k)\. This set of equations has exactly the same appearance as the 
augmented normal equations Eq. (2.20). Thus, the coefficients «#,«' are the solutions to two optimal 
predictor problems: one based on the process x(n) and the other based on y(n). 

We now claim that Eq. (2.20) and (5.12) together imply R(k) = r(k), for the range of values 
< k < N. This is proved by observing that the coefficients tf/v.; and Sjy completely determine the 
lower-order optimal prediction coefficients a m ^ and errors Em (see end of Section 4.3.3, "Upward 
and Downward Recursions"). Thus, a m j and £/, satisfy smaller sets of equations of the form 
Eq. (2.20). Collecting them together, we obtain a set of equations resembling Eq. (2.37). We 
obtain a similar relation if R(i) is replaced by r(k). Thus, we have the following two sets of 
equations (demonstrated for N = 3) 



r(0) r(l) r(2) r(3) 

r*(l) r(0) r(l) r(2) 

r*{2) r*(l) r(0) r(l) 

r*(3) r*(2) r*(l) r(0) 



1 


o" 




«3,1 1 







«3,2 «2,1 1 







a 3,3 «2,2 a l. 


1 lj 





S( x x x 

o e( ; 



«?/ x 



0,?, 



/ 



r 7V+l 



i?(0) R(l) R{2) R(3) 
R*(1)R(0) R(l) R(2) 
R*(2)R*(1)R(0) R(l) 
R*(3) R*(2) R*(l) R(0) 



1 


o" 




«3,1 1 







«3,2 «2,1 1 







a 3,3 a 2,2 #1, 


I 1 ] 





XXX 



Si 



£ / 



x x 



S{ 
£, 



R 



■iV+l 



The symbol x stands for possibly nonzero entries whose values are irrelevant for the discus- 
sion. Note that A^ is lower triangular and A Ml and A„ 2 are upper triangular matrices. We 
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have not shown that A ai = A„ 2 , so we will take a different route to prove the desired claim 
Eq. (5.8). By premultiplying the preceding equations with Aj, we get 



Alt N+1 A e = A] A, 



aJRvi.iA/ = a!a„ 



The right-hand side in each of these equations is a product of two upper triangular matrices and 
is therefore upper triangular. The left-hand sides, on the other hand, are Hermitian. As a result, 
the right-hand sides must be diagonal! It is easy to verify that these diagonal matrices have Em 
as diagonal elements, so that the right-hand sides of the above two equations are identical. This 
proves that AjRjv+iA^ = Ajr/v+iA^. Since A<> is nonsingular, we can invert it, and obtain 
R-tv+1 — r N+i- This proves Eq. (5.8) indeed. □ 

In a nutshell, given {a^j} and the error £jy, we can use the downward recursion (Section 

r r 

4.3.3) to compute a m ja.nd£m torm = N— 1, N— 2, . . . So we uniquely identify R(0) = £ . Next, 
from the normal equation with m = 1, we can identify R(l). In general, if R(0), . . . , R(m — 1) are 
known, we can use the normal equation for wth order and identify R[m) from the mth. equation in 
this set (refer to Eq. (2.8)). Thus, the coefficients R(0), R(l), ■ ■ .R(N), which lead to a given set 
of {«tv.,} and £^, are unique, so r(k) and R(k) in Eqs. (2.20) and (5.12) must be the same! 

R(k) is Matched to the Autocorrelation of the IIR Filter. The AR(N) model of Fig. 5.3(b) 
can be redrawn as in Fig. 5.4 where e\ (n) is white noise with variance equal to unity and where 

? 



A N {z) 

H[z) is a causal IIR filter with impulse response, say, h{ri). The deterministic autocorrelation of 
the sequence h(n) is defined as 

^h{n)b*{n-k). 

n 

Because e\(n) is white with JE'[|ei(«)| ] = l,the autocorrelation r(k) of the AR model process y(n) 
is given by 

r(k) = ^h{n)h*{n - k). (5.14) 

n 

So we can rephrase Eq. (5.8) as follows: 
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FIGURE 5.4: Autoregressive model y(n) of a signal x(n). 

Corollary 5.1. Let x(n) be a WSS process with autocorrelation R(k) and An(z) be the 
A^th-order optimal predictor polynomial. Let h[n) be the impulse response of the causal stable IIR 

filter y£^/A N (z). Then, 

oo 

R{k) = ^h{n)h*{n - k), (5.15) 

for \k\ < N. That is, the first N -\- 1 coefficients of the deterministic autocorrelation of h(n) are 
matched to the values R(k). In particular, if x(ri) is AR(N), then Eq. (5.15) holds for all k. 

5.5 POWER SPECTRUM OF THE AR MODEL 

We just proved that the AR(N) approximation y(n) is such that its autocorrelation coefficients 
r(k) match that of the original process x(n) for the first N + 1 values of k. As N increases, we 
therefore expect the power spectrum S yy (e JUJ ) of y(n) to resemble the power spectrum S xx (e JUJ ) of 
x(n). This is demonstrated next. 

Example 5.2: AR approximations. In this example, we consider a random process generated 
as the output of the fourth-order IIR filter 

3.9 - 2.7645Z- 1 + 1.4150z" 2 - 0.5515z" 3 
G A {z) 



1 - 0.9618Z" 1 + 0.7300Z" 2 - 0.5315z" 3 + 0.5184z" 4 

This system has complex conjugate pairs of poles inside the unit circle at 0.9e ' /7r and0.8e J7T . 
With white noise of variance a e driving this system, the output has the power spectrum 

^(e^)=a 2 |G 4 (e^)| 2 . 

This is shown in Fig. 5.5 (dotted plots), with o e assumed to be such that the peak value of 5^ (e JW ) 
is unity. Clearly S xx (e JU ) is not an AR process because the numerator of G^z) is not a constant. 

Figure 5.5 shows the AR(N) approximations S yy (e J ' Aj ) of the spectrum ^(e-^) for various 
orders Af (solid plots). We see that the AR(A) approximation is quite unacceptable, whereas 
the AR(5) approximation shows significant improvement. The successive approximations are 
better and better until we find that the AR(9) approximation is nearly perfect. From the 
preceding theory, we know that the AR(9) approximation and the original process have matching 
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FIGURE 5.5: AR approximations of a power spectrum: (a) AR(4), (b) AR(5), (c) AR(6), (d) AR(7), 
(e)AR(8), and (f) AR(9). 
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FIGURE 5.5: Continued. 
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autocorrelation coefficients, that is, R(m) = r(m) for < m < 9. This is indeed seen from the 
following table: 

m R(m) r(m) 






0.1667 


0.1667 


1 


0.0518 


0.0518 


2 


-0.0054 


-0.0054 


3 


0.0031 


0.0031 


4 


-0.0519 


-0.0519 


5 


-0.0819 


-0.0819 


6 


-0.0364 


-0.0364 


7 


-0.0045 


-0.0045 


8 


0.0057 


0.0057 


9 


0.0318 


0.0318 


10 


0.0430 


0.0441 


11 


0.0234 


0.0241 



The coefficients R(m) and r[rn) begin to differ starting from m = 10 as expected. So the AR(9) 
approximation is not perfect either, although the plots of S xx (e JUJ ) and Syy(e JUJ ) are almost 
indistinguishable. 

The spectrum S yy (e, JUJ ) of the AR approximation^??) is called d.nAR-model-based estimate of 
S xx (e JUJ ). This is also said to be the maximum entropy estimate, for reasons explained later in Section 
6.6.1. We can say that the estimate S yy (e JUJ ) is obtained by extrapolating the finite autocorrelation 
segment 

R(k), \k\ <N (5.16) 

using an AR(N) model. The estimate <SL(e- ; ") is nothing but the Fourier transform of the 
extrapolated autocorrelation r(k). Note that if R(k) were "extrapolated" by forcing it to be zero for 
|^| > N, then the result may not even be a valid autocorrelation (i.e., its Fourier transform may not 
be nonnegative everywhere). Further detailed discussions on power spectrum estimation techniques 
can be found in Kay and Marple (1981), Marple (1987), Kay (1988), and Therrien (1992). 

Peaks are well matched. We know that the AR(N) model approximates the power spectrum 
S xx (e JLU ) with the all-pole spectrum power spectrum 

V(e*) = JJ^p (»7) 
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If S xx (e JUJ ) has sharp peaks, then ^v(z) has to have zeros close to the unit circle to approxi- 
mate these peaks. If, on the other hand, S xx (e JUJ ) has zeros (or sharp dips) on the unit circle, 
then S yy (e JU) ) cannot approximate these very well because it cannot have zeros on the unit 
circle. Thus, the AR\N) model can be used to obtain a good match of the power spectrum 




I Original 

AR(31) 




0.4 0.6 

(o/ji 

FIGURE 5.6: Demonstrating that the AR approximation of a power spectrum shows good agreement 
near the peaks. In the AR(18) approximation, the peaks are more nicely matched than the valleys. The 
AR(31) approximation is nearly perfect. The original power spectrum is generated with a pole-zero filter 
as in Ex. 5.2, but instead of Gt\{z), we have in this example a lOth-order real-coefficient filter Gio(z) 
with all poles inside the unit circle. 
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S xx {e JUJ ) near its peaks but not near its valleys. This is demonstrated in the example shown in 
Fig. 5.6. Such approximations, however, are useful in some applications such as the AR represen- 
tation of speech. □ 

Spectral factorization. If the process x[n) is indeed AR(N), then the autoregressive model 
is such that S xx (e JUJ ) = S yy (e J "). So, 

In other words, we have found a rational transfer function H^{z) = c/A^{z) such that 

5«(e^) = \H N {^)\ 2 . 

We see that H^(z) is a spectral factor of S xx {e JU) ). Thus, when x{n) is AR(N), the iVth-order 
prediction problem places in evidence a stable spectral factor H^(z) of the power spectrum S xx (e JlAj ). 
When x(n) is not AR(N) , the filter Hj^(z) is an AR{N) approximation to the spectral factor. 

Example 5.3: Estimation of AR model paramaters. We now consider an AR(4) process 
xiy-n) derived by driving the filter \j A 4 [z) with white noise, where 

A 4 {z) = 1.0000 - 1.8198Z" 1 +2.0425z -2 - 1.2714z~ 3 + 0.4624z~ 4 

This has roots 

0.2627 ± 0.8084/, 0.6472 ± 0.4702 j 

with absolute values 0.85 and 0.8. A sample of the AR(N) process can be generated by first 
generating a WSS white process e(n) using the Matlab command randn and driving the filter 
\jA/\,{z) with input e(n). The purpose of the example is to show that, from the measured values 
of the random process x(n), we can actually estimate the autocorrelation and hence, the model 
parameters (coefficients of A 4 (z)) accurately. 

First, we estimate the autocorrelation R(k) using (2.43), where L is the number of samples 
of x(n) available. To see the effect of noise (which is always present in practice), we add random 
noise to the output x(n), so the actual data used are 

*noi sy («) = x{n) +r](n). 

The measurement noise T](n) is assumed to be zero-mean white Gaussian. Its variance will be 
denoted as o . With L = 2 and o = 10 , and x no i sy (n) used instead of x(n), the estimated 
autocorrelation R(i) for < k < 4 is as follows: 

R(0) = 8.203, #(1) = 4.984, R(2) = -1.032, R{3) = -3.878, R(4) = -2.250. 
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FIGURE 5.7: Example 5.3. Estimation of AR model parameters. Magnitude responses of the original 
system \/A^{z) and the estimate \/A^{z) are shown for three values of the measurement noise variance 
a 1 = 10 _ , 0.0025, and a 1 = 0.01, respectively, from top to bottom. The segment length used to 



estimate autocorrelation is L = 2 



10 
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FIGURE 5.8: Example 5.3. Repetition of Fig. 5.7 with segment length L = 2 instead of 2 
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From this, the estimated prediction polynomial can be obtained as 

A 4 (z) = 1.0000 - 1.8038z _1 + 2.0131z -2 - 1.2436z -3 + 0.4491z -4 

This should be compared with the original A 4 (z) given above. The roots of^4(z) are 0.2616 ± 
0.8036y and 0.6403 ± 0.4679 j, which are inside the unit circle as expected. 

Figure 5.7 (top plot) shows the magnitude responses of the filters l/^4(z) and l/A^z), 
which are seen to be in good agreement. The middle and lower plots show these responses with 
the measurement noise variance increased to a = 0.0025 and a = 0.01, respectively. Thus, the 
estimate deteriorates with increasing measurement noise. For the same three levels of measurement 
noise, Fig. 5.8 shows similar plots for the case where the number of measured samples of x[n) are 
reduced to L = 2 . Decreasing L results in poorer estimates of A^z) because the estimated R(i) 
is less accurate. 

5.6 APPLICATION IN SIGNAL COMPRESSION 

The AR(N) model gives an approximate representation of the entire WSS process x(n), using 
only N -\- 1 numbers, (i.e., the N autoregressive coefficients a* N i and the mean square value £jy). 
In practical applications such as speech coding, this modeling is commonly used for compressing 
the signal waveform. Signals of practical interest can be modeled by WSS processes only over short 
periods. The model has to be updated as time evolves. 

It is typical to subdivide the sampled speech waveform x(n) (evidently, a real-valued sequence) 
into consecutive segments (Fig. 5.9). Each segment might be typically 20-ms long, corresponding 
to 200 samples at a sampling rate of 10 kHz. This sequence of 200 samples is viewed as representing 
a WSS random process, and the first N autocorrelation coefficients R(i) are estimated. With 
N << 200 (usually N ~ 10), the finiteness of duration of the segment is of secondary importance 
in affecting the accuracy of the estimates. There are several sophisticated ways to perform this 
estimation (Section 2.5). The simplest would be to use the average 

1 L ~ % 
R(k) « -^2 x(n)x(n - k), < k < N. (5.19) 

Here, L is the length of the segment. If 7V<< L, the end effects (caused by the use of finite 
summation) are negligible. In practice, one multiplies the segment with a smooth window to reduce 
the end effects. Further details about these techniques can be found in Rabiner and Schafer (1978). 



Note that when x(n) is real-valued as in speech, the polynomial coefficients <m,i, the parcor coefficients k,, and the 
prediction error sequence e N {ri) are real as well. 
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FIGURE 5.9: (a) Compression of a signal segment using LPC and (b) reconstruction of an approxima- 
tion of the original segment from the LPC parameters Apj(z) and £^. The kth segment of the original 
signal x(n) is approximated by a segment of the AR signal y{n). For the next segment of x(n), the LPC 
parameters Apj(z) and £/v are generated again. 



Note that with segment length L = 200 and the AR model order N = 10, a segment of 200 
samples has been compressed into 11 coefficients! The encoded message 



ff 

a N,l a N.2 ■ ■ ■ a N,N C-tv 



{5.20) 



is then transmitted, possibly after quantization. At the receiver end, one generates a local white-noise 
source e{n) with variance £jy and drives the IIR filter 

1 1 

{5.21) 



A N {z) 1 + YJi^a^z-' 

with this noise source as in Fig. 5.9(b). A segment of the output y{n) is then taken as the AR{N) 
approximation of the original speech segment. 

Pitch-excited coders and noise-excited coders. Strictly speaking, the preceding discussion 
tells only half the story. The locally generated white noise e{n) is satisfactory only when the speech 
segment under consideration is unvoiced. Such sounds have no pitch or periodicity (e.g., the part 
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"Sh" in "Should I." By contrast, voiced sounds such as vowel utterances (e.g., "I") are characterized 
by a pitch, which is the local periodicity of the utterance. This periodicity T of the speech segment 
has to be transmitted so that the receiver can reproduce faithful voiced sounds. The modification 
required at the receiver is that the AR(N) model is now driven by a periodic waveform such as an 
impulse train, with period T. Summarizing, the AR\N) model is excited by a noise generator or a 
periodic impulse train generator, according to whether the speech segment is unvoiced or voiced. 
This is summarized in Fig. 5.10. There are further complications in practice such as transitions 
from voiced to unvoiced regions and so forth. □ 

Many variations of linear prediction have been found to be useful in speech compression. A 
technique called differential pulse code modulation is widely used in many applications; this uses 
the predictive coder in a feedback loop in an ingenious way. The reader interested in the classical 
literature on speech compression techniques should study (Atal and Schroeder, 1970; Rabiner and 
Schafer, 1978; Jayant and Noll, 1984; Deller et al, 1993) or the excellent, although old, tutorial 
paper on vector quantization by Makhoul et al. (1985). 

Quantization and stability. It is shown in Appendix C that the unquantized coefficients 
a NA> which result from the normal equations are such that \/An(z) has all poles inside the unit circle 
(i.e., it is causal and stable). However, if we quantize a^i before transmission, this may not continue 
to remain true, particularly in low bit-rate coding. A solution to this problem is to transmit the 
"parcor" or lattice coefficients k m . From Section 3.3, we know that \k m \ < 1. In view of the relation 
between Levinson's recursion and lattice structures (Section 4.3.1), we also know that \/A^{z) is 
stable if and only if \k m \ < 1. Suppose the coefficients k m are quantized in such a way that \k„,\ < 1 
continues to hold after quantization. At the receiver end, one recomputes an approximation A J (z) 
of A^(z), from the quantized versions k m . Then, the AR(N) model 1/AjJ (z) is guaranteed 
to be stable. 
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FIGURE 5.10: Reconstruction of a speech segment from its AR{N) model. The source of excitation 
depends on whether the segment is voiced or unvoiced (see text). 
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In Section 7.8, we introduce the idea of line-spectrum pair (LSP). Instead of quantizing 
and transmitting the coefficients of Aj^(z) or the lattice coefficients k m , one can quantize and 
transmit the LSPs. Like the lattice coefficients, the LSP coefficients preserve stability even after 
quantization. Other advantages of using the LSP coefficients will be discussed in Section 7.8. 

5.7 MA AND ARMA PROCESSES 

We know that a WSS random process x(n) is said to be AR if it satisfies a recursive (IIR) difference 
equation of the form 

TV 

x(n) = — y d*x(n — i) + e(n), {5.2$) 

where e(ii) is a zero-mean white WSS process, and the polynomial D(z) = 1 + ^J<=i d*z~ l has 
all zeros inside the unit circle. We say that a WSS process x(n) is a moving average (MA) process 
if it satisfies a nonrecursive (FIR) difference equation of the form 

TV 



J2p*e(n-i), (5.23) 



;=o 



where e(n) is a zero-mean white WSS process. Finally, we say that a WSS process x(n) is an 
ARMA process if 



TV TV 



x(n) = — y d*x(n — i) + > p*e(n — i), (5.24) 



where e(n) is a zero-mean white WSS process. Defining the polynomials 

TV TV 



D(z) = 1 + ^2 d ^~' and p ( z ) = X^A* Z "'' ( 5 - 25 ) 



1 = 1 ;=o 



we see that the above processes can be represented as in Fig. 5.11. In each of the three cases, x(ii) is 
the output of a rational discrete time filter, driven by zero-mean white noise. For the AR process, 
the filter is an all-pole filter. For the MA process, the filter is FIR. For the ARMA process, the 
filter is IIR with both poles and zeros. 

We will not have occassion to discuss MA and ARMA processes further. However, the 
theory and practice of linear prediction has been modified to obtain approximate models for these 
types of processes. Some discussions can be found in Marple (1987) and Kay (1988). In Section 
5.5, we saw that the AR model provides an exact spectral factorization for an AR process. An 
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FIGURE 5.11: (a) AR, (b) MA, and (c) ARMA processes. 



approximate spectral factorization algorithm for MA processes, based on the theory of linear 
prediction, can be found in Friedlander (1983). 

5.8 SUMMARY 

We conclude the chapter with a summary of the properties of AR processes. 

1. An AR process x(k) of order N (AR(N) process) satisfies an iVth-order recursive difference 
equation of the form Eq. (5.22), where e(n) is zero-mean white noise. Equivalently, x(n) is 
the output of a causal stable IIR all-pole filter 1/Z>(z) (Fig. 5.11(a)) excited by white noise 
e{n). 

2. If we perform iVth-order linear prediction on a WSS process x(n), we obtain the represen- 
tation of Fig. 5.3(a), where e^(n) is the optimal prediction error and Aj^(z) is the predictor 
polynomial. By replacing e N (n) with white noise, we obtain the AR(N) model of Fig. 
5.3(b). Here, yyn) is an AR(N) process, and is the AR(N) approximation (or model) for 
x{n). 

3. The AR(N) model jy(w) approximates x(n) in the sense that the autocorrelations of the two 
processes, denoted r(k) and R(k), respectively, are matched for < \k\ < N (Theorem 5.2). 

4. The AR(N) model y(n) is such that its power spectrum S yy (e JU) ) tends to approximate 
the power spectrum S xx (e JUJ ) of x(n). This approximation improves as A^ increases and is 
particularly good near the peaks of S xx (e JLO ) (Fig. 5.6). Near the valleys of ^(e JUJ ), it is 
not so good, because the AR spectrum <SL(e JUJ ) is an all- pole spectrum and cannot have 
zeros on or near the unit circle. 
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5. If x(n) itself is AR{N), then the optimal iVth-order prediction error e^{n) is white. Thus, 
x(n) can be written as in Eq. (5.22), where e(n) is white and d{ are the solutions (usually 
denoted aj?,i) to the normal equations Eq. (2.8). Furthermore, the AR(N) power spectrum 
£^/\Af4{& -7 W )| 2 i s exactly equal to S xx (e. JU ). Thus, we can compute the power spectrum 
S X x{z JUJ ) exactly, simply by knowing the autocorrelation coefficients R(k), \k\ < TV and 
identifying An(z) using normal equations. This means, in turn, that all the autocorrelation 
coefficients R(k) of an AR(N) process x(n) are determined, if R(k) is known for \k\ < N. 
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CHAPTER 6 



Prediction Error Bound and Spectra 

Flatness 




6.1 INTRODUCTION 

In this chapter, we first derive a closed-form expression for the minimum mean-squared prediction 
error for an AR(iV) process. We then introduce an important concept called spectral flatness. This 
measures the extent to which the power spectrum S xx (e JUJ ) of a WSS process x(n), not necessarily 
AR, is "flat." The flatness 7^ is such that < 7^ < 1, with 7^ = 1 for a flat power spectrum (white 
process). We will then prove two results: 

1. For fixed mean square value R(0), as the flatness measure of an AR(iV) process x[n) gets 
smaller, the mean square optimal prediction error Sjy also gets smaller (Section 6.4). So, a 
less flat process is more predictable. 

2. For any WSS process (not necessarily AR), the flatness measure of the optimal prediction 
error e„,(n) increases as m increases. So, the power spectrum of the prediction error gets 
flatter and flatter as the prediction order m increases (Section 6.5). 



6.2 PREDICTION ERROR FOR AN AR PROCESS 

If x[n) is an AR(iV) process, we know that the optimal prediction error e N (n) is white. We will 

r 

show that the mean square error E^ can be expressed in closed form as 




(e^) duo (6.1) 



where S xx (e JUJ ) is the power spectrum of x(n). Here, Ln is the natural logarithm of a positive 
number as normally defined in elementary calculus and demonstrated in Fig. 6.1. Because x[n) is 
AR, the spectrum S xx (e JU ) > for all u), and hnS xx (e JUJ ) is real and finite for all u). The derivation 
of the above expression depends on the following lemma. 
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3, ■ 




FIGURE 6.1: A plot of the logarithm. 



Lemma 6.1. Consider a polynomial in z of the form A(z) 
the zeros oiA(z) be strictly inside the unit circle. Then, 

f27T 

lnA(e JUJ ) da; = j2n:k 



1 + X^K=l a " Z "' anC ^ ^ et a ^ 



1 

2^ 



(6.2) 



for some integer k. This implies L lnl^e- 7 ")) doj/27r = j2 r xm for some integer ot, so that 



1 

2T 



2tt 



Ln|^(( 



; w l 



2 dw = 0, 



where Ln(.) stands for the principal value of the logarithm. 



(6.3) 





Because the logarithm arises frequently in our discussions in the next few sections, it is 
important to bear in mind some of the subtleties in its definition. Consider the function In w, with 
■w = Re-* . Here, R > and 9 is the phase of w. Clearly, replacing 6 with 9 + 2iik for integer k 
does not change the value of w. However, the value of In w will depend on k. Thus, 



lnw = \j\R +j9 +j27tk, 



(6.4) 



where Ln is the logarithm of a positive number as normally defined in calculus (Fig. 6.1). Thus, In w 
has an infinite number of branches, each branch being defined by one value of the arbitrary integer 
k (Kreyszig, 1972; Churchill and Brown, 1984). If we restrict 9 to be in the range — ir < 9 < ir 
and set k = 0, then the logarithm ln w is said to be evaluated on the principal branch and denoted 
as Ln™ (called the principal value). For real and positive w, this agrees with Fig. 6.1. Note that 
Lnl = 0, but lnl = j2Trk, and its value depends on the branch number k. 

Because the branch of the function ln w has not been specified in the statement of Lemma 
6.1, the integer k is undetermined (and unimportant for our discussions). Similarly, the integer m is 
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undetermined. However, in practically all our applications, we deal with quantities of the form Eq. 
(6.1), so that it is immaterial whether we use ln(.) or Ln(.) (because exp(j27ii) = 1 for any integer 
A). The distinction should be kept in mind only when trying to follow detailed proofs. 

Proof of Lemma 6.1. Note first that because A(z) has no zeros on the unit circle, lnA(e JUJ ) 
is well defined and finite there (except for branch ambiguity). We can express 



A' 



4e>)=n(l-«>e- > ), {6.5) 

i=i 

so that 



A' 



ln^(e^) = Vln(l - a,* - -*") +j2nk. {6.6) 



=i 



It is therefore sufficient to prove that 

ln(l - a,* - -*") duj =j2nk n (6. 7) 



-1 P^TT 



27T./Q 



where k t is some integer. This proof depends crucially on the condition that the zeros of A{z) 
are inside the unit circle, that is |a,-| < 1. This condition enables us to expand the principal value 
using the power series 



Ln(l - a,e" ; ") using th<~ ' 



OO 

v" 



Ln(l-v) = -^—, \v\<l. {6.8) 



n 

n=\ 



Identifying msd= Otfi. JU , we have 

ln(l - ait~ ju> ) = - V "'" c - - +j2irii. {6.9) 

' ^ n 



a?e~ JU 

B=l 



The power series Eq. (6.8) has region of convergence \v\ < 1. We know that any power series 
converges uniformly in its region of convergence and can therefore be integrated term by term 
(Kreyszig, 1972; Churchill and Brown, 1984). Because 

e- JUJn duJ = 0, n = 1,2,3 . . ., {6.10) 



This expansion evaluates the logarithm on the principal branch because it reduces to zero when v — 0. 



68 THE THEORY OF LINEAR PREDICTION 



r N 



Eq. (6.7) readily follows, proving (6.2). Next,y? *(e- ; ") = IltiU ~ "* e ;c "). Because |a,-| < 1, the 
above argument can be repeated to obtain a similar result. Thus, 



-i / 1 2tt -< /■2-7T 

ln|4e^)| 2 dw 



2tt 



2tt 



ln^(e^)du; + 



2tt 



2tt 



ln^*(e' 7 ")da;=727r Z 



D 



for some integer m. This completes the proof. 

Derivation of Eq. (6.1). Recall that eL{n) is the output of the FIR filter A^{z), in response 
to the process x(n). So, the power spectrum ofe^(n) is given by S xx (e JU )\Ap/(e JU ')\ . Because e^(n) 
is white, its spectrum is constant, and its value at any frequency equals the mean square value Sjy of 
e^(n). Thus, 



f/=Me > )|4v(e-^)| 



ju>\\2 



exp 



Ln 



s xx ( e -n\M^)\ 



ju\\2 



exp 



2tt 



2tt 



2tt 



exp — / Ln S«(0|4v(0| 



./w\|2 



dec' 



Ln5«(e^) dw + 



2tt 



Ln|^(e^)| 2 dc; 



exp I — / LnS xx (e J ") div J , 



because L L,n\A^(e JU )\ 2 da; = by Lemma 6.1. This establishes the desired result. The third 
equality above was possible because S xx (e JU )\A^(e JU )\ is constant for all oo. 



6.3 A MEASURE OF SPECTRAL FLATNESS 

Consider a zero-mean WSS process x(n). We know that x(n) is said to be white if its power 
spectrum S xx (t ]u> ) is constant for all to. When x[n) is not white, it is of interest to introduce a 
"degree of whiteness" or "measure of flatness" for the spectrum. Although such a measure is by no 
means unique, the one introduced in this section proves to be useful in linear prediction theory, 
signal compression, and other signal processing applications. 
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Motivation from AM/GM ratio. The arithmetic-mean geometric mean inequality (abbre- 
viated as the AM— GM inequality) says that any set of positive numbers x%, . . . , x^ satisfies the 
inequality 

N / N n 1/JV 

AM GM 

with equality if and only if all x; are identical (Bellman, 1960). Thus, 

GM 
< -r— < 1 (6.12) 

AM v ' 

and the ratio GM/AM is a measure of how widely different the numbers {#,} are. As the numbers 
get closer to each other, this ratio grows and becomes unity when the numbers are identical. The 
ratio GM/AM can therefore be regarded as a measure of "flatness." The larger this ratio, the flatter 
the distribution is. □ 

By extending the concept of the GM/AM ratio to functions of a continuous variable (i.e., 
where the integer subscript i in x; is replaced with a continuous variable, say frequency) we can 
obtain a useful measure of flatness for the power spectrum. 

Definition 6.1. Spectral flatness. For a WSS process with power spectrum S xx (e JUJ ),we define 
the spectral flatness measure as 



expl — / LnS xx (e JUJ ) du; J 



% = — Y~p i 6 - 13 ) 

— / S xx {e-n du 

where it is assumed that S xx (e JlAj ) > for all u>. "0 

We will see that the ratio j x can be regarded as the GM/AM ratio of the spectrum S xx (e JUJ ). 
Clearly, 7^ > 0. The denominator of 7^ is 

,-.2tt 



1 r 

— / S xx (e^)dco = R(0), 
2vr Jo 



'0 
which is the mean square value of x(n) . We will first show that 

-!-/ 5„(e^)du;>expf i- J Ln5„(e^)dwj (6.14) 
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with equality if and only if S xx (e Jul ) is constant (i.e., x(n) is white). Because of this result, we have 

< it < 1, 

with % = 1 if and only if x{n) is white. A rigorous proof of a more general version of (6.14) can 
be found in Rudin (1974, pp. 63—64). The justification below is based on extending the AM— GM 
inequality for functions of continuous argument. 

Justification of Eq. (6.14). Consider a function f(y) > for < y < a. Suppose we divide 
the interval < y < a into N uniform subintervals of length A.y as shown in Fig. 6.2. From the 
AM— GM inequality, we obtain 



N-l 



N-l 



1/jV 



n=0 



H = 

(N-l 
Ln II [f( nA y) 
n=0 

= ex P ^H Ln /("Ay) • 



n=0 



Equality holds if and only if f («Ay) is identical for all n. We can replace \/N with Ajy/a 
(Fig. 6.2) so that the above becomes 

1 N-l /jAf-1 \ 

- 22 f( n ^y)Ay > exp - 22 Ln/(«A_y)A_y . 



n=0 



n=0 



f(y) 




H h 



H h 



Ay 2Ay 



1 ► 

NAy = a 



FIGURE 6.2: Pertaining to the proof of Eq. (6.14). 
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If we let Ay — > (i.e., N — > oo), the two summations in the above equation become integrals. We 
therefore have 



- / Ay) djy > exp I - / Ltif(y) dy 



(6.15) 



With f\y) identified as S xx (c JU) ) > and a = 2tt, we immediately obtain Eq. (6.14). 



□ 



Example 6.1. Spectral Flatness Measure. Consider a real, zero-mean WSS process x(n) 
with the power spectrum S xx (e JLU ) shown in Fig. 6.3(a). Fig. 6.3(b) shows the spectrum for various 
values of a. It can be verified that 



2k Jo 



The quantity in the numerator of Eq. (6.13) can be computed as 
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LnS xx (e JUJ ) duj 
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FIGURE 6.3: (a) Power spectrum for Ex. 6.1 and (b) three special cases of this power spectrum. 
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where a = U) c /ir. We can simplify this and substitute into the definition of 7^ to obtain the flatness 



measure 



This is plotted in Fig. 6.4 as a function of a. We see that as a increases from zero to unity, the 
spectrum gets flatter, as expected from an examination of Fig. 6.3(b). 

6.4 SPECTRAL FLATNESS OF AN AR PROCESS 

We know that if x[n) is an AR(iV) process, then the mean square value of the optimal prediction 
error is given by Eq. (6.1). The spectral flatness measure r ) x defined in Eq. (6.13) therefore becomes 



£ 



f 



N 



R{0) 



(6.16) 



For fixed R{0), the mean square prediction error Sjy is smaller if the flatness measure 7^ is 
smaller. Qualitatively speaking, this means that if x\n) has a very nonflat power spectrum, then the 
prediction error £^ can be made very small. If on the other hand the spectrum S xx (e. JU) ) of x[n) is 
nearly flat, then the prediction error £^ is nearly as large as the mean square value of x(n) itself. 

Expression in Terms of the Impulse Response. An equivalent and useful way to express the 
above flatness measure is provided by the fact that x(n) is the output of the IIR filter 
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FIGURE 6.4: Flatness measure 7^ in Ex. 6.1. 
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in response to the input e N (n) (Fig. 2.1(b)). Because x{n) is AR(iV), the error e^(n) is white, and 
therefore, the power spectrum of x(n) is 



K(e>)| 2 
The mean square value of x(n) is therefore given by 

where g(«) is the causal impulse response of G(z). So, Eq. (6.16) becomes 

1 _ 1 



I X 



where 



E g = 2^ \g(n) \ 2 = energy of the filter G(z). 



n=0 



Summarizing, we have 

Theorem 6.1. Spectral flatness of AR process. Let x{n) be an AR(iV) process with mean square 
value R(0) = E[\x(n)\ 2 ]. hetA^z) be the optimal iVth-order prediction polynomial and £^ the 
corresponding mean square prediction error. Then, the spectral flatness ^ 2 of the process x{n) can 
be expressed in either of the following two forms: 

1. 7, 2 = £//#(0),or 

where g(n) is the causal impulse response of G(z) = 1/A^{z). (} 

Example 6.2. Spectral Flatness of an AR(1) Process. In Ex. 3.3, we considered a process 
X\n) with autocorrelation 

R{k) = p\\ 

with — 1 < p < 1. The optimal first-order prediction polynomial was found to be 

A(z)=l-pz-\ 
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FIGURE 6.5: The flatness of an AR(1) process as a function of the correlation coefficient p. 

We also saw that the first-order prediction error sequence e{ («) is white. This means that the 
process x(n) is AR(l). The IIR all-pole filter G(z) defined in Theorem 6.1 is 

1 1 



G(* 



and its impulse response is g( n) 
for the AR(1) process x(n) is 



A{z) l-pz- 1 
p"U(n). So Y^Lo^{ n ) = 1/(1 ~~ p 2 ), and the flatness measure 



2 1 2 

7* = 1 - P ■ 



This is plotted in Fig. 6.5. So, the process gets flatter and flatter as p gets smaller. The power 
spectrum of the process x(n), which is the Fourier transform of R(k), is given by 

1-p 2 1-p 2 



^jwcV, / 



|1 — pe 



-jui\2 



1 + p 2 — 2p cos u 



This is depicted in Fig. 6.6 for two (extreme) values of p in the range < p < 1. (For —1 < p < 0, 
it is similar except that the peak occurs at n.) As p increases from zero to unity (i.e., the pole of 



S xx (e ) 



0.5 



p = 0.1 



S xx (e ) 




FIGURE 6.6: The power spectrum of an AR(1) process for two values of the correlation coefficient p. 
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G(z) gets closer to the unit circle), the spectrum gets more and more peaky (less and less flat), 
consistent with the fact that 7^ decreases as in Fig. 6.5. 

6.5 CASE WHERE SIGNAL IS NOT AR 

We showed that if x(n) is an AR(iV) process, then the iVth-order mean square prediction error is 
given by (6.1). When x(n) is not AR, it turns out that this same quantity acts as a lower bound for 
the mean square prediction error as demonstrated in Fig. 6.7. We now prove this result. 

Theorem 6.2. Prediction error bound. Let x[n) be a WSS process with power spectrum 
Sxx(e JLJ ) finite and nonzero for all 10. Let Em be the mean square value of the optimal wth-order 
prediction error. Then, 



^/>expf^y LmUe^dcA 



(6.17) 



Furthermore, equality holds if and only if x(n) is AR(iV) and m> N. () 

Proof. We know that the prediction error eL{n) is the output of the FIR filter A m (z) with 
input equal to x(n) (Fig. 2.1(a)). So, its power spectrum is given by S xx (e JUJ ) \A m (e. JU} ) | , so that 



,-.2tt 
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m 2n 



^(e^)K(e^)| 2 do; 
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S„(Ol4»(Or <H % Eq. (6.15)) 



6XP Ur" 



2-1- 



2 k 



2tt 



Ln^(e^) dcj + / Ln|^(e^)| 2 dw 
^0 

2tt \ 



exp — / hnS xx (e JUJ ) du) I (by Lemma 6.1). 
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FIGURE 6.7: The optimal prediction error Em decreases monotonically with increasing m but is lower 
bounded by Eq. (6.1) for any WSS process. 
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Equality holds if and only if S xx (e JU ) \A m (e. JU ) | is constant, or equivalently, x{n) is AR with order 



< m. The proof is therefore complete. 



□ 



6.5.1 Error Spectrum Gets Flatter as Predictor Order Grows 

We will now show that the power spectrum of the error e„(n) gets "flatter" as the prediction order 
m increases. With S m (e^) denoting the power spectrum oieL{n), we have 

S m {t j ") = S xx {t j ")\A m {t j ")\ 2 . 

The flatness measure 7^ for the process e m {n) is 

I-27T \ 



exp 



2tt 



LnS m (e J ") dw 



7» 



exp 



2tt 



2tt 



2n 



5 m (e>) dcj 



2tt 



Ln 



^(e^)K(e^)| 



;ui\|2 



due' 






,-27T 



Using L Ln|-4 m (e- ?a ')| dw = (Lemma 6.1), this simplifies to 



exp I 



2tt 



In 



— J LnS^O dw 



(<Utf) 



The numerator is fixed for a given process. As the prediction order increases, the error £ m decreases. 
This proves that the flatness is an increasing function of prediction order m (Fig. 6.8). 



flatness 
of error 



Q O 



1 2 



_^ jYi (prediction 
order) 



FIGURE 6.8: The flatness 7^ of the optimal prediction error increases monotonically with increasing 
m. It is bounded above by unity. 
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We can rewrite 7^ in terms of the flatness measure 7^ of the original process x(n) as follows: 

7^(0) 



7. 






Here, 7^ andi^(O) are constants determined by the input process x(n). The flatness 7^ of the 
process e m \n) is inversely proportional to the mean square value of the prediction error &„. 

Example 6.3: Spectral Flattening With Linear Prediction. In this example, we consid 
AR(4) process with power spectrum 



error 



eran 



*^a:a:V / 



1 



|^(e>)| : 



where 



A(z) = 1 - 1.8198Z" 1 +2.0425z' 2 - 1.2714z' 3 + 0.4624z" 4 
The first few elements of the autocorrelation are 



R(0) R(l) R{2) R(3) R(4) 
7.674 4.967 -0.219 -3.083 -2.397 



R(5) R(6) R(7) 

-0.639 -0.087 -0.474 



If we do linear prediction on this process, the prediction polynomials are 
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1.0 
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1.0 


-0.6473 
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1.0 


-1.1457 


0.7701 














4W 


1.0 


-1.5669 


1.3967 


-0.5469 











A A {z) 


1.0 


-1.8198 


2.0425 


-1.2714 


0.4624 








A 5 {z) 


1.0 


-1.8198 


2.0425 


-1.2714 


0.4624 








A(,{z) 


1.0 


-1.8198 


2.0425 


-1.2714 


0.4624 









Thus, after Aa,{z), the polynomial does not change, because the original process is AR(4). The 
lattice coefficients are 



h 



h h 



-0.6473 0.7701 -0.5469 0.4624 0.0 0.0 0.0 



Again, after ^4, all the coefficients are zero because the original process is AR(4). 
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Fig. 6.9 shows the power spectrum S m (e JUJ ) of the prediction error e m {n) for 1 <m< 4. The 
original power spectrum S xx (e JiAj ) = ^(e 7 ") is shown in each plot for comparison. For convenience, 
the plots are shown normalized so that S xx (e JUJ ) has peak value equal to unity. It is seen that the 
spectrum S„,(e JUJ ) gets flatter as m grows, and ^(e- 7 ") is completely fiat, again because the original 
process is AR(4). The flatness measure, calculated for each of these power spectra, is as follows: 
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FIGURE 6.9: Original AR(4) power spectrum (dotted) and power spectrum of prediction error 
(solid), (a) Prediction order = 1, (b) prediction order = 2, (c) prediction order = 3, and (d) prediction 
order = 4. 
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FIGURE 6.9: Continued. 
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6.5.2 Mean Square Error and Determinant 

In Section 2.4.3, we found that the minimized mean square prediction error can be written in 
terms of the determinant of the autocorrelation matrix. Thus, 



/ ef 



detR m — c m _ 1 c m _ 2 ... t . 



(6.19) 



•/ 



We know that the sequence £„, is monotone nonincreasing and lower bounded by zero. So it 
reaches a limit £<4 as m — > oo. We will now show that 



lim det R„ 



l/ti 



£ f 



(6.20) 



80 THE THEORY OF LINEAR PREDICTION 

The quantity (det R m ) ' m can be related to the differential entropy of a Gaussian random process 
(Cover and Thomas, 1991, p. 273). We will say more about the entropy in Section 6.6.1. 
Proof of (6.20). If a sequence of numbers Oq, ot\, a2, ■ ■ • tends to a limit a, then 



M-\ 



M-\ 

lim — > a; = a. (6.21) 



-o 



That is, the average value of the numbers tends to a as well (Problem 26). Defining a,- = Ln£ ; - , 
we have from Eq. (6.19) 

Ln (detR*) = Ln£/_j + Ln£/_ 2 . . . + Lnf/. 

Because Ln(^) is a continuous function when x > 0, we can say 

limLn£/=Ln£^. (6.22) 

i— >oo 

(Problem 26). We therefore have 



lim — Ln(detR m ) = lim — I Ln£;_j +Ln£f_, ... +Ln£ ( 

by invoking Eq. (6.21) and (6.22). This implies 

exp lim — Ln(detR m ) = exp(Ln£ ( ^). 

Because exp(x) is a continuous function of the argument x, we can interchange the limit with 
exp(.). This results in (6.20) indeed. □ 

A deeper result, called Szego's theorem (Haykin, 1986, pp. 96), can be used to prove the 
further fact that the limit in Eq. (6.20) is really equal to the bound on the right-hand side of 
Eq. (6.17). That is, 

JhmfdetRj = £ £ = exp f -L J LaS m (e^)6u\. (6.23) 
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In other words, the lower bound on the mean square prediction error, which is achieved for the 
AR(iV) case for finite prediction order, is approached asymptotically for the non-AR case. For 
further details on the asymptotic behavior, see Grenander and Szego (1958) and Gray (1972). 

Predictability measure. We now see that the flatness measure in Eq. (6.13) can be written 
in the form 



s 



f 



->> = W) {6 - 24) 

For very small 7^, the power Sio in the prediction error is therefore much smaller than the power 
R(0) in the original process, and we say that the process is very predictable. On the other hand, if 
7^ is close to unity then the ultimate prediction error is only slightly smaller that R(0), and we say 
that the process is not very predictable. For white noise, 7^ = 1, and the process is not predictable, 
as one would expect. The reciprocal 1/7^ can be regarded as a convenient measure of predictability 
of a process. □ 

6.6 MAXIMUM ENTROPY AND LINEAR PREDICTION 

There is a close relation between linear prediction and the so-called maximum entropy methods 
(Burg, 1972). We now briefly outline this. In Section 6.5, we showed that the wth-order prediction 

r 

error 8m for a WSS process x(n) is bounded as 

Sf>expl-^[ Ln^(e^)dcA (6.25) 

We also know that if the process x(n) is AR(iV), then equality is achieved in Eq. (6.25) whenever 
the prediction order m > N. Now consider two WSS processes x(n) and jy(«) with autocorrelations 
R(k) and r(k), respectively, and power spectra S xx (t JU) ) and Syy(e JW ), respectively. Assume that 

R(k) =r(k), \k\ <N. 

Then the predictor polynomials A m (z) and the mean square prediction errors £,{ will be identical 
for the processes, for 1 < m < N. 

Suppose now thatj/(«) is AR(iV), but x(n) is not necessarily AR, then from Section 6.2, we 
know that 



exp(i-y Ln^(e>)do;j = e£ 
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From Theorem 6.2, on the other hand, we know that 

£/>exp(^/ Ln5„(e^)dwj 

Comparison of the two preceding equations shows that 

exp f j- / Ln^(e^) dcj J > exp f j- / LnS,,(e^) dcj ) . 

Moreover, they are equal if and only if x(n) is also AR(iV) (by Theorem 6.2). Because exp(v) is a 
monotone-increasing function of real v, the above also means 

2tt p2tt 

LnS yy (e JUJ )duJ > / LnS xx (e JUJ )duJ. 
o Jo 

Summarizing, we have proved the following result. 

Theorem 6.3. Let x(n) and y(n) be two WSS processes with autocorrelations R(k) and 
r(k), respectively, and power spectra S xx (e JUJ ) > and Syy(e JUJ ) > 0, respectively. Assume that 
r(k) = R(k) for \k\ < iVand that y(n) is AR(iV), then 

/"27T /"27T 

/ LnSyyie^) duj> LnS xx (e JUJ ) dco (6.26) 

Jo Jo 

* v ' 

AR(N) process 
with equality if and only if x(n) is also AR. <D 

6.6.1 Connection to the Notion of Entropy 

The importance of the above result lies in the fact that the integrals in Eq. (6.26) are related to the 
notion of entropy in information theory. If a WSS process y(n) is Gaussian, then it can be shown 
(Cover and Thomas, 1991) that the quantity 

4> = a + b \ Ln^(e' ; ") dw (6.27) 

Jo 

is equal to the entropy per sample (entropy rate) of the process, where a = In v 2vre and b = l/4ir. 

Details. For discrete-amplitude random variables, entropy is related to the extent of 

"uncertainty" or "randomness." For a continuous-amplitude random variable, the entropy is more 

appropriately called differential entropy and should be interpreted carefully. Thus, given a random 
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variable x with differential entropy T~C X , consider the quantized version x& with a fixed quantization 
step size A. Then the number of bits required to represent x^ is 

b/\ = H x + c, 

where c depends only on the step size A (Cover and Thomas, 1991, p. 229). Thus, for fixed step 
size, larger differential entropy implies more number of bits, uncertainty, or randomness. □ 

Although the integral in the second term of (6.27) will be referred to as "entropy" and 
interpreted as "randomness" in the following discussions, the details given above should be kept 
in mind. The problem of maximizing <f> under appropriate constraints is said to be an entropy 
maximization problem. Theorem 6.3 says that, among all Gaussian WSS processes with fixed 
autocorrelation for \k\ < N, the entropy has the maximum value for AR(N) processes. Furthermore, 
all AR(iV) processes with autocorrelation fixed as above have the same entropy. 

Now assume that y(n) is the AR(iV) model for x{n) obtained by use of linear prediction as 
in Fig. 5.3. We know that in this model e{n) is white noise with variance ££. If we also make e(n) 
Gaussian, then the output y(n) will be Gaussian as well (Papoulis, 1965). In other words, we have 
obtained an AR(iV) model y(n) for the process x(n), with the following properties. 

1. The AR(iV) process y(n) is Gaussian. 

2. The autocorrelation r(A) of the AR(iV) process y\n) matches the autocorrelation R(i) of 
the given process x(n) (i.e., R(k) = r\k)), for \k\ < N (as shown in Section 5.4). 

3. Under this matching constraint, the entropy L hnSyy(e.^) da; of the process y(n) is the 
largest possible. In other words, y(n) is "as random as it could be," under the constraint 
r(k) =i?(»for|A;| < N. 

The relation between linear prediction and maximum entropy methods is also discussed by Burg 
(1972), Robinson (1982), and Cover and Thomas (1991). Robinson (1982) also gives a good 
perspective on the history of spectrum estimation. Also see Papoulis (1981). 

6.6.2 A Direct Maximization Problem 

A second justification of Theorem 6.3 will now be mentioned primarily because it might be more 
insightful for some readers. Suppose we forget about the linear prediction problem and consider 
the following maximization problem directly: we are given a WSS process x(n) with autocorrelation 
R(i)- We wish to find a WSS process y(n) with the following properties. 

1. Its autocorrelation r(k) satisfies r(k) = R(k) for \k\ < N. 
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2. With Syy(e Jal ) denoting the power spectrum of y(n), the entropy L ~LnS yy (e; JUJ ) du) is as 
large as possible. This must be achieved by optimizing the coefficients r(k) for \k\ > N. 

A necessary condition for the maximization 2 of the entropy (^defined in Eq. (6.27) is d(f>/dr (k) = 
for |/^| > N. Now, 



dr{k) J dr{k) 



* ' ^^K. 



,ye>) e>r(*) 
e - -^* dw 



27T 



o S yy (eJ u ) 

The last equality follows by using ^(e^) = ^^.^ r(k)e~ JlAjk . Setting d(f)/dr(k) = 0, we 
therefore obtain 



2lT -i 



Syy\ eJ ) 



-*** dto = 0, 1*1 > N. 



This says that the inverse Fourier transform of 1/ S yy (e JUJ ) should be of finite duration, restricted to 
the region \k\ < N. In other words, we must be able to express l/S yy (e JUJ ) as 

cMn{^)\\ 



Syy{^) 

or equivalently 



K(e^)| 2 ' 



where Aj^(z) is an FIR function of the form Aj^(z) = 1 + 2^<=i a Ni z ~'- Thus, S yy (e JiAj ) is an 
autoregressive spectrum. 

Because the desired solution has been proved to be AR(Af), we can find it simply by finding 
the AR(AQ model of the process x(n) using linear prediction techniques (Section 5.3). This is 
because we already know that the AR(A^ ) model y(n) arising out of linear prediction satisfies 
r(k) = R(k) for \k\ < A^ (Section 5.4) and, furthermore, that an AR(N) spectrum satisfying the 
constraint r(k) = R(k), \k\ < A^is unique (because the solution to the normal equations is unique). 



"The fact that this maximizes rather than minimizes <fi is verified by examining the Hessian matrix (Chong and 
Zak, 2001; Antoniou and Lu, 2007). 
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6.6.3 Entropy of the Prediction Error 

Let eln{n) be the »rth-order optimal prediction error for a WSS process x{n), and let S m (e. JU) ) be 
the power spectrum for e m (n). With S xx (e JU ) denoting the power spectrum of x(n), we have 

s m ( e n = s xx (e-n\Men\ 2 

and S (e ju ) = S xx {e-^). Because f** Ln\A m {e-'")\ 2 duj = (Lemma 6.1), we have 

r27T /»27T 



/■Z7T />Z7T 

/ LnS M (e>) dcj = / Ln^(e^) dcj, 



for all id. \i x(n) is Gaussian, then so is e«(«), and the integral above represents the entropy of 
el(n). 

Thus, in the Gaussian case, the entropy of the prediction error is the same for all prediction 
orders m and equal to the entropy of the input x(n). From Eq. (6.18), we see that the flatness 7^ 
of the error e m {n) is given by 

2 K , s 

ll = — f , (6.28) 



£. 



r 



where K is independent of m and depends only on the entropy of x(n). So, the flatness of the 
prediction error is proportional to l/£m. Thus, as the prediction order m increases, the error 
spectrum becomes flatter, not because the entropy increases, but because the mean square error Em 
decreases. 

6.7 CONCLUDING REMARKS 

The derivation of the optimal linear predictor was based only on the idea that the mean-squared 
prediction error should be minimized. We also pointed out earlier that the solution has the 
minimum-phase property and that it has structural interpretations in terms of lattices. We now 
see that the solution has interesting connections to flatness measures, entropy, and so forth. It is 
fascinating to see how all these connections arise starting from the "simple" objective of minimizing 
a mean square error. 

Another interesting topic we have not discussed here is the prediction of a continuous-time 
band-limited signal {torn a finite number of past samples. With sufficiently large sampling rate, this 
can be done accurately, with predictor coefficients independent of the signal! This result dates back 
to Wainstein and Zubakov (1962). Further results can be found in Vaidyanathan (1987), along 
with a discussion of the history of this problem. 
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CHAPTER 7 





Line Spectral Processes 



7.1 INTRODUCTION 

A WSS random process is said to be line spectral, if the power spectrum S xx (e JUJ ) consists only of 
Dirac delta functions, that is, 



S xx (e JUJ ) =2ir) j c i 5 a ( 



U) 



U>; 



< u> < 2n, 



(7.1) 



where {w,} is a set of L distinct frequencies (called line frequencies) and q > 0. If q > for all i, 
then L represents the number of lines in the spectrum as demonstrated in Fig. 7.1, and we say that 
L is the degree of the process. We also sometimes abbreviate this as a Linespec(Z,) process. Note 
that the power of the signal x(n) around frequency u>{ is C{, that is, 






dco 



(7.2) 



for sufficiently small e > 0. So we say that q is the power at the line frequency a>,-. 

In this chapter, we study line spectral processes in detail. We first express the Linespec(Z,) 
property concisely in terms of the autocorrelation matrix. We then study the time domain 
properties and descriptions of Linespec(Z,) processes. For example, we show that a Linespec(Z) 
process satisfies a homogeneous difference equation. Using this, we can predict, with zero error, 
all of its samples, if we know the values of L successive samples. We then consider the problem of 
identifying a Linespec(-L) process in noise. More specifically, suppose we have a signal y(n), which 
is a sum of a Linespec(Z,) process x(n) and uncorrelated white noise e(n): 

y(n) = x(n) + e(n). 

We will show how the line frequencies U>; and the line powers q in Eq. (7.1) can be extracted from 
the noisy signal. 
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FIGURE 7.1: Power spectrum of a line spectral process with L lines. The Mi line is a Dirac delta 
function with height 27tq. 



7.2 AUTOCORRELATION OF A LINE SPECTRAL PROCESS 

The autocorrelation of the line spectral process is the inverse Fourier transform of Eq. (7.1): 



a(*) = X> 



JUJik 



(7.3) 



i-i 



This is a superposition of single-frequency sequences. Consider the (L + 1) X (LA- 1) autocorre- 
lation matrix R^+i of a WSS process: 



R-L+l 



R(0) R(l) 

R*(l) R(0) 

R*(L) R*(L - 1) 



R(L) 
R(L - 1) 

R(0) 



(7- 



The Linespec(Z,) property can be stated entirely in terms of R/,+1 and R^ as shown by the following 
result. 

Theorem 7.1. Line spectral processes. A WSS process x(n) is line spectral with degree/, if and 
only if the matrix R/,+1 is singular and R^, is nonsingular. <$> 

Proof. We already showed in Section 2.4.2 that if R/,+1 is singular, then the power spectrum 
has the form Eq. (7.1) (line spectral, with degree < L). Conversely, assume x(n) is line spectral 
with degree < L. If we pass x(n) through an FIR filter V(z) of order < L with zeros at the line 
frequencies cj,-, the output is zero for all n. With the filter written as 



V(z) = Y,vU~\ 



(7.5) 
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the output is 



Defining 



vq x(n) + v* x(n — 1) + . . . + v£ x(n — L) = 0. 



(«) = x(n) x(n — 1) ... x(n — L) 



(7.6) 



and v 

v^£[x(w)x" f ( 



Vq Vl . . . V L 



, we can rewrite (7.6) as v^x(«) = 0. From this, we obtain 



]v = 0, that is, 



^R L+1 v = 0. 



Because R/,+1 is positive semidefinite, the above equation implies that Rl+i is singular. Summa- 
rizing, Rx+i is singular if and only if x(n) is line spectral with degree < L. By replacing L + 1 
with L, it also follows that R^ is singular if and only if x(n) is line spectral with degree < L — 1. 
This means, in particular, that if x(n) is line spectral with degree equal to L, then R^ has to be 
nonsingular and, of course, R/,+1 singular. 

Conversely, assume R^ is nonsingular and R^+i singular. The latter implies that the process 
is line spectral with degree < L, as we already showed. The former implies that the degree is not 
< L — 1. So the degree has to be exactly/,. □ 

Notice, by the way, that R^ is a principal submatrix of R^+i in two ways: 



R 



i+i 



R £ x 

x R(0) 



R(0) x 
x R L 



(7.7) 



We now show that if x(n) is Linespec(Z,), there is a unique nonzero vector v (up to a scale factor) 
that annihilates R/,+1 : 



R 



■L+W 



0. 



(7.8) 



Proof. Because R/,+1 is singular, it is obvious that there exists an annihilating vector v. To 
prove uniqueness, we assume there are two linearly independent vectors u/0 and w/0, such that 
Ri + iu = Ri + iw = 0, and bring about a contradiction. The 0th elements of these vectors must be 
nonzero, that is, uq ^ and wo 7^ 0> f° r otherwise the remaining components annihilate R^ in 
view of (7.7), contradicting nonsingularity of R^. Now consider the vector y = vl/uq — w/wq. This 
is nonzero (because u and w are linearly independent), but its zeroth component is zero: 



7^0 



Because R^+iy = 0, it follows that R^z = 0, z^ 0, which violates nonsingularity of R^. □ 
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The result can also be proved by using the eigenvalue interlace property for principal 
submatrices of Hermitian matrices (Horn and Johnson, 1985). But the above proof is more direct. 

7.2.1 The Characteristic Polynomial 

The unique vector v that annihilates Rj,+i has zeroth component vq ^ 0. For, if vq = 0, then the 
smaller vector 



iT 



V 1 v 2 . . . V L 



would satisfy Rjw = (in view of Eq. (7.7)) contradicting the fact that that Rj, is nonsingular. 
So, without loss of generality, we can assume vq = 1. So the FIR filter V(z) in Eq. (7.5), which 
annihilates the line spectral process x(n), can be written as 

L 

i=\ 

This filter, determined by the unique eigenvector v, is itself unique and is called the characteristic 
polynomial or characteristic filter of the line spectral process. From Eq. (7.6), we see that a 
Linespec(Z) process x[n) also satisfies the homogeneous difference equation 



x(n) = — y v* x(n — i) . 



(7.9) 



This means, in particular, that all future samples can be predicted with zero error if a block of L 
samples is known. 



rank of R r 



19 
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1 2 



L 
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FIGURE 7.2: The rank of the autocorrelation matrix R,„ as a function of size m, for a Linespec(L) 
process. 
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7.2.2 Rank Saturation for a Line Spectral Process 

From the above discussions, we see that if x(n) is Linespec(Z), then Rj, as well as R.£+i have rank 
L. It can, in fact, be proved (Problem 24) that 

, „ m for 1 < m < L , 

rankR,„= { "7 {7.10) 

I L tor m > L. 

This is indicated in Fig. 7.2. Thus, the rank saturates as soon as m exceeds the value L. 

7.3 TIME DOMAIN DESCRIPTIONS 

Line spectral processes exhibit a number of special properties, which are particularly useful when 
viewed in the time domain. In this section, we discuss some of these. 

7.3.1 Extrapolation of Autocorrelation 

We showed that a Linespec(-L) process x(n) sastisfies the homogeneous difference equation 
Eq. (7.9). If we multiply both sides of (7.9) by x*[n — k) and take expectations, we get 

L 

(=1 

where we have used R(k) = E[x(n)x*(n — £)]. Conversely, we will see that whenever the autocor- 
relation satisfies the above recursion, the random process x(n) itself satisfies the same recursion, 
that is, (7.9) holds. 

Thus, if we know R(i) for < k < L — 1 for a Linespec(Z,) process x[n), we can compute 
R(k) for all k using (7.11). This is similar in spirit to the case where x(n) is AH(N), in which case, 
the knowledge of R(k) for < k < TV would reveal the polynomial A^[z) and therefore the value 
o£R(k) for all k (Section 5.3.2). 

7.3.2 All Zeros on the Unit Circle 

From the proof of Theorem 7.1, we know that V(z) annihilates the process x(n). This implies, in 
particular, that all the L zeros of 

L 

m= 1 
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are on the unit circle at the L distinct points UJ;, where S xx (e JUJ ) has the lines. This means, in 
particular, that the numbers v- t satisfy the property 

Vi = cv* L _i , \c\=l, 0<i<L, (7.13) 

where vq = 1. That is, V(z) is a linear phase filter. Because V(z) has the L distinct zeros e JUJi , the 
solutions to the homogeneous equation Eq. (7.9) necessarily have the form 



x{n / 



J2d ie JUJi ". (7.14) 



Note that there are only L random variables d{\n Eq. (7.14), although the "random" process x(n) is an 
infinite sequence. The WSS property of x(n) imposes some strong conditions on the joint behavior 
of the random variables d-„ which we shall derive soon. The deterministic quantity q in Eq. (7.3) is 
the power of the process x(n) at the line frequency LO{, whereas the random variable di is said to be 
the random amplitude (possibly complex) at the frequency 10{. 

7.3.3 Periodicity of a Line Spectral Process 

The process (7.14) is periodic if and only if all the frequencies lu; are rational multiples of 2n 
(Oppenheim and Schafer, 1999). That is, 

u { = (— W 
VAT,-/ 

for integers K^ M;. In this case, we can write ui{ = 2nLi/ M where .Mis the least common multiple 
(LCM) of the set {M;}, and Z,, are integers. Thus, all the frequencies are harmonics of the 
fundamental frequency 2tt/M. Under this condition, we say that the process is periodic with period 
M or just harmonic. Because R(k) has the same form as x(n) (compare Eqs. (7.3) and (7.14)), we 
see that R(k) is periodic if and only if x(n) is periodic. In this connection, the following result is 
particularly interesting. 

Theorem 7.2. Let x(n) be a WSS random process with autocorrelation R(k). Suppose 
R(0) = R(M) for some M > 0. Then, x(n) is periodic with period M, and so is R(k). () 

Proof. Because R(0) = R(M),we have 

E[x(n)x*(n)\ = E[x(n)x*(n - M)]. (7.15) 



Another way to see this would be as follows: R(k) satisfies the homogeneous difference equation (7.11). At the 
same time, it also has the form (7.3). From the theory of homogeneous equations, it then follows that the L zeros 
of V(z) have the form e JUJ ' . 
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The mean square value of x(n) — x(n — M), namely, 



x(n) - x{n - M) ) (x*(n) - x*{n - M) 



can be written as 



li = E[\x{n) | 2 ] - E[x(n)x*(n - M)} 

-E[x{n-M)x*{n)} + E[\x{n - M)\ 2 ]. 

The first and the last terms are equal because of WSS property. Substituting further from 
Eq. (7.15), we get // = 0. So the random variable x(n) — x[n — M) has mean square value equal 
to zero. This implies x(k) = x(n — M). So x(n) is periodic with period M. Using the definition 
R(k) = E[x(n)x*(n — k)\, it is readily verified that R(k) is also periodic, with period M. □ 

7.3.4 Determining the Parameters of a Line Spectral Process 

Given the autocorrelation R(k), < k < L of the Linespec(Z,) process x(n), we can identify the 
unique eigenvector v satisfying R/, + iv = 0. Then, the unique characteristic polynomial V(z) is 
determined (Eq. (7.12)). All its L zeros are guaranteed to be on the unit circle (i.e., they have the 
form e JUi ). The line frequencies u>i can therefore be identified. From Eq. (7.3), we can solve for the 
constants q (line powers) by writing a set of linear equations. For example, if L = 3, we have 



(7.16) 



The 3x3 matrix is a Vandermonde matrix and is nonsingular because a*,- are distinct (Horn and 
Johnson, 1985). So, we can uniquely identify the line powers c,-. Thus, the line frequencies and 
powers in Fig. 7.1 have been completely determined from the set of coefficients R(i), < k < L. 

Identifying amplitudes of sinusoids in x{n). Similarly, consider the samples x(n) of 
the Linespec(Z,) random process x(n). If we are given the values of L successive samples, say 
x(0),x(l), . . .x(L — 1), we can uniquely identify the coefficients dj appearing in Eq. (7.14) by 
solving a set of equations similar to (7.16). Thus, once we have measured L samples of the random 
process, we can find all the future and past samples exactly, with zero error! The randomness is only 
in the L coefficients {<f,-}, and once these have been identified, x(n) is known for all n. □ 

7.4 FURTHER PROPERTIES OF TIME DOMAIN DESCRIPTIONS 

Starting from the assumption that x(n) is a line spectral process, we derived the time domain 
expression Eq. (7.14). Conversely, consider a sequence of the form Eq. (7.14), where d{ are random 



1 1 1 




C\ 




R(0) 


e .M e >"2 e i^3 
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R(l) 


e 2jut e 2juj2 e 2juj 3 




J3_ 




R(2) 
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variables. Is it necessarily a line spectral process? Before addressing this question, we first have to 
address WSS property. The WSS properly of Eq. (7.14) itself imposes some severe restrictions on 
the random variables d;, as shown next. 

Lemma 7.1. Consider a random process of the form 



x(n) = ^dit jUin , (7.17) 



where d{ are random variables and ui{ are distinct constants. Then, x(n) is a WSS process if and 
only if 

1. E[d t ] = whenever loi ^ (zero-mean condition). 

2. E[d{d*\ = for i ^ m (orthogonality condition). <0> 

Proof. Recall that x(n) is WSS if £'[«;(«)] and E[x(n)x*(n — k)\ are independent of n. First, 
assume that the two conditions in the lemma are satisfied. We have i?[*(«)] = ^J=i E\dj\e JU)in . By 
the first condition of the lemma, this is constant for all n. Next, 



L L 



E[x{n)x*{n - k)} = 2^2^E[d i d*]e J ^ n e- JU '" ( "- k) (7.18) 

z— 1 m— 1 

With d; satisfying the second condition of the lemma, this reduces to 

L 

E[x(n)x*(n - k)\ = ^JE^fje*"*, (7.19) 



i=i 



which is independent of n. Thus, the two conditions of the lemma imply that x(n) is WSS. Now 
consider the converse. Suppose x(n) is WSS. Because £[#(«)] is independent of n, we see from 
Eq. (7.17) that E[d-\ = whenever loi 7^ 0. So the first condition of the lemma is necessary. Next, 
Eq. (7.18) can be rewritten as 



L L 



E[x(n)x*(n - k)\ = ^ ^ e JUJin E[d i d*]e- JUJ "-"e-' 



i—1 m—1 
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Express the right-hand side as 

[e JLJl " ...e^ £ "]D 



-ju> 2 n 












t Mk 







t Mk 


-ju> L n 




e JUJ L i 






J 



call this A(n) 



t(fc) 



where D is an L x L matrix with [D], m = E[d;d*]. WSS property requires that A(n)t(k) be 
independent of n, for any fixed choice of k. Thus, defining the matrix 

T=[t(0)t(l)...t(L-l)], 

we see that A(w)T must be independent of n. But T is a Vandermonde matrix and is nonsingular 
because LOi are distinct. So A(«) must itself be independent of n. Consider now its with column: 



[A(«)] m = } y E[djd m *]e 



j(Ui-u,„)n 



;=1 



Because u)i are distinct, the differences W; — uo m are distinct for fixed m. So the above sum is 
independent of n if and only if E[d;d* ] = for i ^ m. The second condition of the lemma is, 
therefore, necessary as well. □ 



From Eq. (7.19), it follows that R(k) has the form 

L 

R{k) = Y J E[\d i \ 2 \ & 



JUJjk 



(7.20) 



so that the power spectrum is a line spectrum. Thus, if d{ satisfies the conditions of the lemma, 
then x(ri) is not only WSS, but is also a line spectral process of degree L. Summarizing, we have 
proved this: 

Theorem 7.3. Consider a random process of the form 

L 

x (n)=^diZ Mn , (7.21) 

(=1 

where d- t are random variables and U>,- are distinct constants. Then, x(n) is a WSS and, hence, a line 
spectral process if and only if 

1. E[dj] = whenever u>; 7^ (zero-mean condition). 

2. E[d;d*] = for i 7^ m (orthogonality condition). 
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Under this condition, R(k) is as in Eq. (7.20), so that E[\d{\ ] represents the power at the line 
frequency 0J{. 

Example 7.1: Stationarity and Ergodicity of Line spectral Processes. Consider the random 
process x(n) = Ae JUJ °", where u)q 7^ is a constant and A is a zero-mean random variable. We see 
that£[x(w)] = E[A\e^° n = and E[x(n)x*(n - k)\ = £[|y4| 2 ]e^.Becausethese are independent 
of n, the process x(n) is WSS. 

If a process x[n) is ergodic (Papoulis, 1965), then, in particular, i?[|x(«)|] must be equal to 
the time average. In our case, because |#(«)| = \A\, we see that 

i A 

1 V^ I../..M2 _ I ,,12 



2N+1 ^ 

n=-N 



\x(n)\ l = \A\ 



and is, in general, a random variable! So the time average will in general not be equal to the ensemble 
average E[\A\ ], unless \A\ is nonrandom. (For example, if A = ce J , where c is a constant and 6 is 
a random variable, then \A\ would be a constant.) So we see that, a WSS line spectral process is 
not ergodic unless we impose some strong restriction on the random amplitudes. 

Example 7.2: Real Random Sinusoid. Next consider a real random process of the form 

x(n) = A cos(coon + 9), 

where A and are real random variables and loq > is a constant. We have 

x{n) = 0.5At j6 t JUJo " +0.5Ae- j9 e- ja "> n . 

From Lemma 7.1, we know that this is WSS if and only if 

E[Ae j8 }=0, E[Ae~ je } = 0, and E[A V 29 ] = 0. (7.22) 

This is satisfied, for example, if A and are statistically independent random variables and is 
uniformly distributed in [0, 27t). Thus, statistical independence implies 

E[Ae je ] =E\A]E\t je ] and E[A 2 t fte ) = E\A 2 }E[t fle ], 

and the uniform distribution of implies JB[e-^] = iife- 729 ] = 0, so that Eq. (7.22) is satisfied. 
Notice that Lemma 7.1 requires thatAe-' andvfe - - 7 have zero mean for WSS property, but it is not 
necessary for A itself to have zero mean! Summarizing, x(n) = A cos(cjo" + @) is WSS whenever 
the real random variables A and are statistically independent and is uniformly distributed in 
[0,27r]. It is easily verified that £[*(«)] = and 

R(k) = E[x{n)x{n - A)} = 0.5£[^ 2 ]cos(u;o/£) = Pcos(u} k), 
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where P = 0.5E[A ]. So the power spectrum is 

M e> ) = - x2ir(s t (u-uo) + Sa(u + Uo)J, 0<uj<2tt. 

The quantity P is the total power in the two line frequencies ztwo- 

Example 7.3: Sum of Sinusoids. Consider a real random process of the form 

N 

x(n) = y^A cos(cj;k + 6i). (7.23) 

Here ^,- and 0; are real random variables and w,- are distinct positive constants. Suppose we make 
the following statistical assumptions: 

1. A t and m are statistically independent for any i, m. 

2. 0i and 9 m are statistically independent for i ^ m. 

3. m is uniformly distributed in < 6 m < 2tt for any m. 

We can then show that x{n) is WSS. For this, we only have to rewrite x(n) in the complex form 
Eq. (7.21) and verify that the conditions of Theorem 7.3 are satisfied (Problem 25). Notice that 
no assumption has been made about the joint statistics ofy^- andA m , for i ^ m. Under the above 
conditions, it can be verified that £[#(«)] = 0. How about the autocorrelation? We have 

R(k) = E[x(n)x(n - k)\ 

= E[x(0)x(-k)} (using WSS) 

N N 

= E^A; COS(0,) /^m COs(-kuJ m + 9 m ). 
i—1 m—1 

Using the assumptions listed above, the cross terms for i ^ m reduce to 

E\A i A m }E[co${d i )}E[cos{-kuj m + 9 m )] = 0. 
When i = m, the terms reduce to P; cos(o;,-^) as in Ex. 7.2. Thus, 

N 

R(k) = E[x(n)x(n - k)] = ^ P,-cos(u;,vfc), 

i=l 

where P; = O.SE[A i ] > 0. The power spectrum is therefore 

S xx {e JUJ ) =*J2Pi U t (u - u)i) + 5 a {co + Ui) J , < lo < 2tt. 

i=l ^ ' 

The total power at the frequencies u> ( - and — uoi is given by 0.5P, + 0.5P; = P,-. 
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cos(co «) 




FIGURE 7.3: Plot of the cosine. 

Philosophical Discussions. A random process of the form Eq. (7.21), where there are only/, 
random variables d-„ always appears to hold some conceptual mystery. If we measure the L random 
variables d{ by some means, then we know the entire waveform x\n). So the samples are fully 
predictable, as also seen from the difference equation Eq. (7.9). Furthermore, a plot of x(n) does 
not look "random." For example, considers? cos(cJo« + &) discussed in Ex. 7 .2. Here, the random 
variables^ and 6 do not depend on n, and the plot of x(n) as a function of n (Fig. 7.3) is just a nice 
and smooth cosine! 

So, where does the randomness manifest? Recall that a random process, by definition, is 
a collection (or ensemble) of waveforms (Papoulis, 1965; Peebles, 1987). Each waveform is an 
outcome of an experiment. For example, the result of an experiment determines the values of the 
random variables A and 9 to be Aq and 9q, and the outcome Aq cos(u;o w + $o) is fully determined 
for all n and is said to be a realization of the random process. It is for this reason that it is somewhat 
tricky to force ergodicity (Ex. 7.1); ergodicity says that time averages are equal to ensemble averages, 
but if a particular outcome (time function) of the random process does not exhibit "randomness" 
along time, then the time average cannot tell us much about the ensemble averages indeed. 

7.5 PREDICTION POLYNOMIAL OF LINE SPECTRAL 
PROCESSES 

Let x(n) be Linespec(Z,) and let Rx+i be its autocorrelation matrix of size (L + 1) X (L + 1). In 
Section 7.2, we argued that there exists a unique vector v of the form 

V = [ 1 Vi ...Vl] 

such that R^+iv = 0. By comparison with the augmented normal equation Eq. (2.20), we see that 
if we take the coefficients of the Z,th-order predictor to be 

«L.i = Vi, 
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then the optimal Z,th-order prediction error £ L becomes zero. The optimal predictor polynomial 
for the process x(n) is therefore 



A L (z) = 1 + v\z 1 + . . . + v* L z~ 



V{z). 



In Section 7.3.2, we saw that V(z) has all the L zeros on the unit circle, at the points e JU ', where W; 
are the line frequencies. We therefore conclude that the optimal Z,th-order predictor polynomial 
A^{z) of a Linespec(Z,) process has all L zeros on the unit circle. This also implies 



i L-n 



for some r with \c\ = 1. For example, in the real case, a n is symmetrical or antisymmetrical. In what 
follows, we present two related results. 

Lemma 7.2. Let R(k) be the autocorrelation of a WSS process x(n), and let R(k) satisfy the 
difference equation 



R(k) = -J2 **R(* 



(7.24) 



=i 



Furthermore, let there be no such equation of smaller degree satisfied by R(k). Then the process 
x(n) is line spectral with degree L and the polynomial ^(z) = 1 + X/=i a ^ z ~' necessarily has all 
the L zeros on the unit circle. 

Proof. By using the property R(k) = R*(—k),we can verify that Eq. (7.24) implies 



1 

a L 



' R(0) R0-) ■ 


.. R(L) 


R*(l) R(0) . 


..R(L-l) 


R*{L) R*(L - 1) . 


.. R(0) 



So R-L+i is singular. If R^, were singular, we could proceed as in Section 7.2.1 to show that x(n) 
satisfies a difference equation of the form Eq. (7.9) (but with lower-degree L — 1); from this, we 
could derive an equation of the form Eq. (7.24) but with lower degree L — 1. Because this violates 
the conditions of the lemma, we conclude that Rj, is nonsingular. Using Theorem 7.1, we therefore 
conclude that the process x[n) is Linespec(-L). It is clear from Section 7.3 that the polynomial 
Al(z) is the characteristic polynomial V(z) of the process x(n) and therefore has all zeros on the 
unit circle. □ 
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We now present a related result directly in terms of the samples x(n) of random process. 
This is a somewhat surprising conclusion, although its proof is very simple in light of the above 
discussions. (For a more direct proof, see Problem 23.) 

Theorem 7.4. Let x\n) be a random process satisfying 

L 

x(n) = — > a*x(n — i). {7.2$) 

(=1 

Furthermore, let there be no such equation of smaller degree satisfied by x(n). If the process x(n) is 
WSS, then 

1. The polynomial y^i(z) = 1 + Y]:—-t #*z~' has all the L zeros on the unit circle. 

2. x(n) is line spectral with degree/,. <0 

Thus, if a process satisfies the difference equation Eq. (7.25) (and if L is the smallest such number), 
then imposing the WSS property on x(n) immediately forces the zeros of ^l(z) to be all on the 
unit circle! 

Proof. Suppose the process is WSS. If we multiply both sides of Eq. (7.25) by x*(n — k) 
and take expectations, this results in Eq. (7.24). If there existed an equation of the form Eq. (7.24) 
with L replaced by L — 1, we could prove then that R^ is singular and proceed as in Section 7.2.1 
to derive an equation of the form Eq. (7.25) with L replaced by L — 1. Because this violates the 
conditions of the theorem, there is no smaller equation of the form Eq. (7.24). Applying Lemma 
7.2, the claim of the theorem immediately follows. □ 

7.6 SUMMARY OF PROPERTIES 

Before we proceed to line spectral processes buried in noise, we summarize, at the expense of being 
somewhat repetitive, the main properties of line spectral processes. 

1. Spectrum has only impulses. We say that a WSS random process is Linespec(Z,) if the power 
spectrum is made of L impulses, that is, 

L 

£«( e> ) = 27T Y1 CiS *( u ~ L0 '^ 0<u < 2tt, (7.26) 

with Ci > for any i. Here, CJ; are distinct and called the line frequencies and c; are the 
powers at these line frequencies. In this case, the process x(n) as well as its autocorrelation 
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R(k) satisfy an Z,th-order recursive, homogeneous, difference equation: 

L L 

x(n) = - Y^ v*x(n - i) , R(k) = - ^ v*R(k -i). ( 7.27) 

i=l i=i 

So all the samples of the sequence x(n) can be predicted with zero error, if L successive 
samples are known. Similarly, all the autocorrelation coefficients R(k) of the process x(n) 
can be computed if R(k) is known for < k < L — 1. 

2. Characteristic polynomial . In the above, v* are the coefficients of the polynomial 

L 

V(z) = l + ^2v* i z~ i - {7.28) 

i=\ 

This is called the characteristic polynomial of the Linespec(Z,) process *(»).This polynomial 
has all its L zeros on the unit circle at z = e JLJ; . 

3. Sum of sinusoids. The random process x(n) as well as the autocorrelation R(k) can be 
expressed as a sum of L single-frequency signals 

L L 

x („) = Y di& ju " n , R{k) = Y «i<- M * > ( 7 - 29 ) 

i=l (=1 

where d- t are random variables (amplitudes at the line frequencies u>,-) and c; = isfl^j 2 ] 
{powers at the line frequencies U>i). 

4. Autocorrelation matrix. A WSS process x{n) is Linespec(Z,) if and only if the autocorrelation 
matrix Rj,+i is singular and Rj, nonsingular. So R^+i has rank L. There is a unique vector 
v of the form 



1 v\ . . . v L 



T 



(7. JO) 



satisfying R^ + iv = 0, and this vector uniquely determines the characteristic polynomial 
V{z). Furthermore, the rank has the saturation behavior sketched in Fig. 7.2. 

5. Sum of sinusoids, complex. A random process of the form x(n) = ^J;=i die JU>i " is WSS and, 
hence, line spectral, if and only if (a) E[d;] = whenever u>,- 7^ and (b) E[d;d* ] = for 
i / m. Under this condition, R(A) = ^f =1 E[\di\ 2 ]e-' u < k (Theorem 7.3). 

6. Sum of sinusoids, real. Consider the process x(n) = ^ ( =i-^< cos(o;j« + di), where A{ and 
0; are real random variables and W; are distinct positive constants. This is WSS, hence, 
line spectral, under the conditions stated in Ex. 7.3. The autocorrelation is then R(i) = 
yi-_i Pi cos(ujjk), where P; = 0.5JS[^ ( ] is the total power in the line frequencies iu;,-. 
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7. Identifying the parameters. Given the set of autocorrelation coefficients R(i), < k < L of 
the Linespec(-L) process x(n), we can determine V{ in Eq. (7.28) from the eigenvector v 
of R.£+i corresponding to zero eigenvalue. The line frequencies U0i can then be determined 
because e JUJi are the zeros of the characteristic polynomial V(z). Once the line frequencies 
are known, the constants q are determined as described in Section 7.3.4. 

8. Periodicity. If a WSS process x(ti) is such that R(0) = R(M) for some M > 0, then x(n) is 
periodic with period M, and so is R(k) (Theorem 7.2). So the process is line spectral. 

9. Prediction polynomial . If we perform optimal linear prediction on a Linespec(Z,) process, 
we will find that the Z,th-order prediction polynomial Aj\z) is equal to the characteristic 

polynomial V(z) and therefore has all its zeros on the unit circle. The prediction error 

cf 



ef = o. 



7.7 IDENTIFYING A LINE SPECTRAL PROCESS IN NOISE 

Identification of sinusoids in noise is an important problem in signal processing. It is also related 
to the problem of finding the direction of arrival of a propagating wave using an array of sensors. 
An excellent treatment can be found in Therrien (1992). We will be content with giving a brief 
introduction to this problem here. Consider a random process 

y(n) = x(n) + e(n), {7-31) 

where x(n) is Linespec(-L) (i.e., aline spectral process with degree/,). Let e[n) be zero-mean white 
noise with variance 07, that is, 

E[e{n)e*{n- k)) =a 2 e 5(k). 

We thus have a line spectral process buried in noise (popularly referred to as sinusoids buried in 
noise). The power spectrum of x[n) has line frequencies (impulses), whereas that of e\n) is fiat, 
with value a e for all frequencies. The total power spectrum is demonstrated in Fig. 7.4. Notice that 
some of the line frequencies could be very closely spaced, and some could be buried beneath the 
noise level. 

Assume that x(n) and e[m) are uncorrelated, that is, 

E\x{n)e*(m)\ =0, for any «, #2. 
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FIGURE 7.4: A Linespec(Z,) process with additive white noise of variance a 2 



Let r(A) and R(A) denote the autocorrelation sequences of y(n) and x(n), respectively. Using this 
and the whiteness of e(n), we get 

r(k) = E\y(n)y*(n - A)] 

= E[x(n)x*(n - A)] + E[e(n)e*(n - A)] 
= R(A)+a 2 5(A), 

which yields 

r(A) = R(A) + a 2 5(A) (7.32) 

Note, in particular, that r(A) = R(A), A 7^ 0. Thus, the m X m autocorrelation matrix t m of the 
process_y(n) is given by 

r m = R-m + o" e I m) (7.33) 

where R,„ is the m X m autocorrelation matrix of x(n) . We say that R m is the signal autocorrelation 
matrix and t m is the signal-plus-noise autocorrelation matrix. 

7.7.1 Eigenstructure of the Autocorrelation Matrix 

Let rji be an eigenvalue of R„, with eigenvector u,-, so that 



Then 



Thus, the eigenvalues of i m are 



R^U, = 7] ,11,. 

[R m + &jl„] u, = [rji + a 2 ]u;. 



A; = rji + a 2 . 
The corresponding eigenvectors of r,„ are the same as those of R„ 
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FIGURE 7.5: (a) The decreasing minimum-eigenvalue of the autocorrelation matrix R OT as the size , 
increases and (b) corresponding behavior of the minimum eigenvalue of r m . 



The Minimum Eigenvalue. Because x(n) is Linespec(Z,), its autocorrelation matrix R m 
is nonsingular for 1 < m < L and singular for m = L + 1 (Theorem 7.1). Thus, the smallest 
eigenvalue of R m is positive for m < L and equals zero for m > L. It is easily shown (Problem 
12) that the smallest eigenvalue A m i n (R m ) of the matrix R m cannot increase with ot. So it behaves 
in a monotone manner as depicted in Fig. 7.5(a). 

Accordingly, the smallest eigenvalue of t m behaves as in Fig. 7.5(b). In particular, for m > L 
it attains a constant value equal to a e . We can use this as a means of identifying the number of 
line frequencies L in the process x(n), as well as the variance a e of the noise e(n). (This could be 
misleading in some cases, as we cannot keep measuring the eigenvalue for an unlimited number of 
values of m; see Problem 29.) 

Eigenvector corresponding to the smallest eigenvalue. Let v be the eigenvector of t L+ \ 
corresponding to the smallest eigenvalue o e . So 



In view of Eq. (7.33) this implies 



that is, 



r L+1 v = a e \. 



Rl+iv + of v = of v, 



(7,34) 



R 



i+iv 



0. 
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Thus, v is also the eigenvector that annihilates R/,_|_i. The eigenvector corresponding to the 
minimum eigenvalue of r/,+1 can be computed using any standard method, such as, for example, 
the power method (Golub and Van Loan, 1989). Once we identify the vector v, then we can find 
the characteristic polynomial V(z) (Eq. (7.28)). Its roots are guaranteed to be on the unit circle 
(i.e., have the form e ■'"')> revealing the line frequencies. The smallest eigenvalue of r^ + i gives the 



noise variance <7T. 



□ 



7.7.2 Computing the Powers at the Line Frequencies 

The autocorrelation R(i) of the process x[n) has the form (7.3), where q are the powers at the line 
frequencies u;,-. Because we already know a;,-, we can compute c ( - by solving linear equations (as in 
Eq. (7.16)), if we know enough samples of R(i). For this note that R(i) is related to the given 
autocorrelation r(k) as in Eq. (7.32). So we know all the samples of R(k) (because a e has been 
identified as described above). 

What if we do not know the noise variance? Even if the noise variance o 2 e and, hence, 
R(0) are not known, we can compute the line powers as follows. From Eq. (7.32) we have 
R(k) = r(k), k 7^ 0. So we know the values of R(l), ■ ■ ■ , R(L). We can write a set of L linear 
equations 



(7.35) 



The L x L matrix above is Vandermonde and nonsingular because the line frequencies u>{ are 
distinct (Horn and Johnson, 1985). So we can solve for the line powers q uniquely. □ 

The technique described in this section to estimate the parameters of a line spectral process 
is commonly referred to as Pisarenko's harmonic decomposition. Notice that the line frequencies need 
not be harmonically related for the method to work (i.e., u>k need not be integer multiplies of a 
fundamental lOq). 

Case of Real Processes. The case where x(n) is a real line spectral process is of considerable 
importance. From Ex. 7.3 recall that a process of the form 

N 
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c(n) = /_j^i cos(tOin + 0,) 



i=i 



is WSS and hence, harmonic, under some conditions. If x(n) in Eq. (7.31) has this form, then the 
above discussions continue to hold. We can identify the eigenvector v of r^ + i corresponding to 
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its minimum eigenvalue (Eq. (7.34)) and, hence, the corresponding line frequencies as described 
previously. In this case, it is more convenient to write R(i) in the form 

N 

1=1 

where P ( - > is the total power at the frequencies ±u>,-. Because R(k) is known for 1 < k < N, we 
can identify P ( - by writing linear equations, as we did for the complex case. 

Example 7.4: Estimation of sinusoids in noise. Consider the sum of sinusoids 

x(n) = 0.3 sin(O.lTrw) + 1.6 sin(0.57rw) +2.1 sin(0.97rw), (7.J<5) 

and assume that we have noisy samples 

y(n) = x(n) + ?(«), 

where e{n) is zero-mean white noise with variance a e = 0.1. We assume that 100 samples of y(n) 
are available. The signal component x(n) and the noise component e\n) generated using the Matlab 
command randn are shown in Fig. 7.6. The noisy data_y(?z) can be regarded as a process with a line 
spectrum (six lines because each sinusoid has two lines), buried in a background noise spectrum 
(as in Fig. 7.4). We need the 7x7 autocorrelation matrix R7 of the process y(n), to estimate the 
lines. So, from the noisy data y(n), we estimate the autocorrelation R(k) for < k < 6 using the 
autocorrelation method described in Section 2.5. This estimates the top row of R7: 

3.6917 -2.0986 0.5995 -1.4002 2.1572 -0.1149 -1.8904 

From these coefficients, we form the 7x7 Toeplitz matrix R7 and compute its smallest eigenvalue 
and the corresponding eigenvector 

-iT 

0.6065 0.0083 -0.3619 0.0461 -0.3619 0.0083 0.6065 

With the elements of v regarded as the coefficients of the polynomial V(z), we now compute the 
zeros of V(z) and obtain 

0.0068 ± 0.9999/, 0.9414 ± 0.3373/, -0.9550 ± 0.2965/. 

From the angles of these zeros, we identify the six line frequencies iu>i, i^2> and ±6^3. The three 
sinusoidal frequencies U)\ , LO2, and 0J3 are estimated to be 

0.1029tt, 0.4985tt, and 0.9034vr, 
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FIGURE 7.6: Example 7.4. Estimation of sinusoids in noise. The plots show the signal and the noise 
components used in the example. 

which are quite close to the correct values given in Eq. (7.36), namely, 0.17T, 0.57T, and 0.97T. With 
UJk thus estimated and R(k) already estimated, we can use (7.35) to estimate c/, and finally obtain 
the amplitudes of the sinusoids (positive square roots of q in this example). The result is 

0.3058, 1.6258, and 2.1120, 
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which should be compared with 0.3, 1.6, and 2.1 in Eq. (7.36). Owing to the randomness of noise, 
these estimates vary from experiment to experiment. The preceding numbers represent averages 
over several repetitions of this experiment, so that the reader sees an averaged estimate. 

Large number of measurements. In the preceding example, we had 100 noisy measurements 
available. If this is increased to a large number such as 5000, then the accuracy of the estimate 
improves. The result (again averaged over several experiments) is 

0.10017T, 0.50007T, and 0.9001vr 

for the line frequencies and 

0.2977, 1.5948, and 2.1019 

for the line amplitudes. If the noise is large, the estimates will obviously be inaccurate, but this 
effect can be combated if there is a large record of available measurements. Thus, consider the above 
example with noise variance increased to unity. Fig. 7 '.7 shows the signal and noise components 
for the first few samples — the noise is very significant indeed. With 10,000 measured samples y(n) 
used in the estimation process (an unrealistically large number), we obtain the estimate 

0.1005tt, 0.4999tt, 0.9000tt, 

for the line frequencies and 

0.3301, 1.6140, and 2.1037 

for the line amplitudes. Considering how large the noise is, this estimate is quite good indeed. 
Figure 7.8 shows even larger noise, almost comparable with the signal (noise variance a e = 2). 
With 10,000 measured samples y(n) used in the estimation process , we obtain the estimate 



for the line frequencies and 



0.1030tt, 0.5039tt and 0.9010vr 



0.2808, 1.6162, and 2.0725 



for the line amplitudes. □ 

In this section, we have demonstrated that sinusoids buried in additive noise can be identified 
effectively from the zeros of an eigenvector of the estimated autocorrelation. The method outlined 
above reveals the fundamental principles as advanced originally by Pisarenko (1973). Since then, 
many improved methods have been proposed, which can obtain more accurate estimates from a small 
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FIGURE 7.7: Example 7.4. Estimation of sinusoids in large noise. The plots show the signal and the 
noise components used in the example. 



number of noisy measurements. Notable among these are minimum-norm, methods (Kumaresan and 
Tufts, 1983), MUSIC (Schmidt, 1979, 1986), and ESPRIT (Paulraj et al, 1986; Roy and Kailath, 
1989). These methods can also be applied to the estimation of direction of arrival information in 
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FIGURE 7.8: Example 7.4. Estimation of sinusoids in very large noise. The plots show the signal and 
the noise components used in the example. 



array processing applications (Van Trees, 2002). In fact, many of these ideas originated in the array 
processing literature (Schmidt, 1979). A review of these methods can be found in the very readable 
account given by Therrien (1992). 



LINE SPECTRAL PROCESSES 111 

7.8 LINE SPECTRUM PAIRS 

In Section 5.6, we discussed the application of linear prediction theory in signal compression. We 
explained the usefulness of the lattice coefficients {k m } in the process. Although the quantization 
and transmission of lattice coefficients {k m } guarantees that the reconstruction filter \/Aj^[z) 
remains stable, the coefficients k m are difficult to interpret from a perceptual viewpoint. If we knew 
the relative importance of various coefficients k m in preserving perceptual quality, we could assign 
them bits according to that. But this has not been found to be possible. 

In speech coding practice, one uses an important variation of the prediction coefficients called 
line-spectrum pairs (LSP). The motivation comes from the fact that the connection between LSP 
and perceptual properties is better understood (Itakura, 1975; Soong and Juang, 1984, 1993; Kang 
and Fransen, 1995). The set of LSP coefficients not only guarantees stability of the reconstruction 
filter under quantization, but, in addition, a better perceptual interpretation in the frequency domain 
is obtained. To define the LSP coefficients, we construct two new polynomials 



P{z)=A N {z)+z^ N+1 U N {z), 



G(*)=4 



n{z 



<N+1)7 



N(Z 



(7.37) 



Notice that these can be regarded as the polynomial A^ + i(z) in Levinson's recursion for the cases 
&n+i = 1 and kjf+i = —1, respectively. Equivalently, they are the transfer functions of the FIR 
lattice shown in Fig. 7.9. 

Let us examine the zeros of these polynomials. The zeros of P(z) and Q(z) are solutions of 
the equations 
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FIGURE 7.9: Interpretation of the LSP polynomials P(z) and Q(z) in terms of the FIR LPC lattice. 
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Because the left-hand side is all-pass of order N-\- 1 with all zeros in \z\ < 1, it satisfies the 
modulus property 



-{N+l) A 



N{Z 



M*) 



< 1 for|z| > 1 
> 1 for|z| < 1 
= 1 forlzl = 1 



See Vaidyanathan (1993, p. 75) for a proof. Thus, the solutions of the preceding equations are 
necessarily on the unit circle. That is, all the N + 1 zeros of P(z) and similarly those of Q(z) are 
on the unit circle. 

Alternation Property. In fact, we can say something deeper. It is well known (Vaidyanathan, 
1993, p. 76) that the phase response (f)(co) of the stable all-pass filter 
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FIGURE 7.10: Development of LSPs. (a) Monotone phase response of z"( iv+1 ^7v(z)/^7v(z) and (b) 
zeros of P(z) and Q(z) for the cases N= 4 andiV= 5. 
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FIGURE 7.11: Examples of LSP coefficients for N = A and N = 5. The unit-circle coefficients in the 
lower half are redundant and therefore dropped. 



is a monotone decreasing function, spanning a total range of 2(N + 1J7T (Fig. 7.10(a)). Thus, the 
7V+ 1 zeros of P(z) and <2(z) alternate with each other as demonstrated in Fig. 7.10(b) for iV = 4 
and N = 5. The angles of these zeros are denoted as W; and #,-. 

Because A^(z) has real coefficients in speech applications, the zeros come in complex 
conjugate pairs. Note that P{— 1) = for even N and Q{— 1) = for odd iV. Moreover, 
g(l) = and P(l) / regardless of N. These follow from the fact that ^(±1) =A N (±1). 
Thus, if we know the zeros of P(z) and Q(z) in the upper half of the unit circle (and excluding 
z = ±1), we can fully determine these polynomials (their constant coefficients being unity by the 
definition (7.37)). There are N such zeros (lOk and Ok), as demonstrated in Fig. 7.11 for N = 4 
and TV = 5. These are called line spectrum pairs (LSP) associated with the predictor polynomialyfTv(z). 
For example, the LSP parameters for N = 4 are the four ordered frequencies 



U\ < Q\ < U>2 < &2 
and the LSP parameters for N = 5 are the five ordered frequencies 

UJ\ < Q\ < Ld2 < 02 < <jJ%- 



(7.38) 



(7.39) 



Properties and Advantages of LSPs. Given the iV coefficients of the polynomial An(z), 
the N LSP parameters can be uniquely identified by finding the zeros of P(z) and Q(z). These 
parameters are quantized, making sure that the ordering (e.g., Eq. (7.39)) is preserved in the 
process. The quantized coefficients are then transmitted. Because P(z) and Q(z) can be computed 
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uniquely from the LSP parameters, approximations of these polynomial can be computed at the 
receiver. From these, we can identify an approximation of A^{z) because 

A N {z) = {P{z) + Q{z))/2. 

The speech segment can then be reconstructed from the stable filter 1/A^[z) as described in 
Section 5.6. Several features of the LSP coding scheme are worth noting. 

1. Stability preserved '. As long as the ordering (e.g., Eq. (7.39)) is preserved in the quantization, 
the reconstructed version ofy^v(z) is guaranteed to have all zeros in |z| < 1. Thus, stability 
of l/y^7v(z) is preserved despite quantization as in the case of lattice coefficient quantization. 

2. Perceptual spectral interpretation. Unlike the lattice coefficients, the LSP coefficients are 
perceptually better understood. To explain this, recall first that for sufficiently large N, the 
AR(N) model gives a good approximation of the speech power spectrum S xx (e JU ). This 
approximation is especially good near the peak frequencies, called the formants of speech. 
Now, the peak locations correspond approximately to the pole angles of the filter 1/A^(z). 
Near these locations, the phase response tends to change rapidly. The same is true of the 
phase response 4>(<jj) of the all-pass filter, as shown in Fig. 7.12(b). The LSP coefficients, 
which are intersections of the horizontal lines (multiples of it) with the plot of 4>(io), 
therefore tend to get crowded near the formant frequencies. Thus, the crucial features of 
the power spectrum tend to get coded into the density-information of the LSP coefficients 
at various points on the unit circle. This information allows us to perform bit allocation 
among the LSP coefficients (quantize the crowded LSP coefficients with greater accuracy). 
For a given bit rate, this results in perceptually better speech quality, as compared with 
the quantization of lattice coefficients. Equivalently, for a given perceptual quality, we can 
reduce the bit rate; in early work, an approximately 25% saving has been reported using 
this idea, and more savings have been obtained in a series of other papers. Furthermore, as 
the bit rate is reduced, the speech degradation is found to be more gradual compared with 
lattice coefficient quantization. 

3. Acoustical tube models. It has been shown in speech literature that the lattice structure is 
related to an acoustical tube model of the vocal tract (Markel and Gray, 1976). This is 
the origin of the term reflection coefficients. The values ^jv+i = il i n Fig- 7.9 indicate, for 
example, situations where one end of the tube is open or closed. Thus, the LSP frequencies 
are related to the open- and close-ended acoustic tube models. 

4. Connection to circuit theory . In the theory of passive electrical circuits, the concept of reactances 
is well-known (Balabanian and Bickart, 1969). Reactances are input impedances of electrical 
LC networks. It turns out that reactances have a pole-zero alternation property similar to the 
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FIGURE 7.12: Explanation of how the LSP coefficients tend to get crowded near the formant regions 
of the speech spectrum, (a) A toy power spectrum with one formant and (b) the phase response of the 
all-pass filter z~( n+1 'An(z) /Ajsriz), where An{z) is the prediction polynomial (see text). 



alternation of the zeros of the LSP polynomials -P(z) and Q(z). In fact, the ratio P(z) / Q(z) 
is nothing but a discrete time version of the reactance function. 

7.9 CONCLUDING REMARKS 

In this chapter, we presented a rather detailed study of line spectral processes. The theory finds 
applications in the identification of sinusoids buried in noise. A variation of this problem is 
immediately applicable in array signal processing where one seeks to identify the direction of arrival 
of a signal using an antenna array (Van Trees, 2002). Further applications of the concepts in signal 
compression, using LSPs, was briefly reviewed. 
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CHAPTER 8 




Linear Prediction Theory for Vector 

Processes 




8.1 INTRODUCTION 

In this chapter, we consider the linear prediction problem for vector WSS processes. Let x(n) 
denote the I x 1 vector sequence representing a zero-mean WSS process. Thus, _E[x(w)] = and 
the autocorrelation is 

R(k) = E[x(n)x\n-k)}, (8.1) 

where x* (n) represents transpose conjugation, as usual. Note that R(^) is independent of n (owing 
to WSS property) and depends only on k. The autocorrelation is a matrix sequence with each 
element R(^) representing an L X L matrix. Furthermore, it is readily shown from the definition 
(8.1) that 

R\-k)=R(k). (8.2) 

For a vector WSS process characterized by autocorrelation R(k), we now formulate the linear 
prediction problem. Most of the developments will be along lines similar to the scalar case. But 
there are some differences between the scalar and vector cases that need to be emphasized. A short 
and sweet introduction to this topic was given in Section 9.6 of Anderson and Moore (1979). We 
shall go into considerably greater details in this chapter. 

8.2 FORMULATION OF THE VECTOR LPC PROBLEM 

The forward predictor of order TV predicts the sample x(n) from a linear combination of past N 
samples: 

x(n) = -a) Nl x(n - 1) - a} N2 x(n - 2) ... - aj V7V x(w-iV). (8.3) 

The forward prediction error is therefore 

e w( w ) = x (») - x(») = x(«) + aj vl x(« - 1) + ^ N2 x(n - 2) ... + a] VJV x(« - N) (8.4) 
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and the forward prediction polynomial is 

A 7V (z) =I L + aj^z" 1 + al 2 z~ 2 + ... + 4j,n z ~ N - ( 8 - 5 ) 

The optimal predictor has the L x L matrix coefficients a^> chosen such that the total mean square 
error in all components is minimized, that is, 

S^E[{4{n))^ f N {n)] (8.6) 

is minimized. This quantity is also equal to the trace (i.e., sum of diagonal elements) of the forward 
prediction error covariance matrix 

E f N = E[4{n){4{n)% (8.7) 

Note that this is an L x L matrix. The goal therefore is to minimize 

s'±Tr(E£) (8.8) 

by choosing a^ appropriately. The theory of optimal predictors for the vector case is again based 
on the orthogonality principle: 

Theorem 8.1. Orthogonality principle (vector processes). The iVth-order forward linear predictor 
for a WSS vector process x(n) is optimal if and only if the error c N (n) at any time n is orthogonal 
to the N past observations x(n — 1), . . . ,x(n — N), that is, 

E[4(n)x\n -k)] = 0, 1 < k < N, (8.9) 

where the right-hand side represents the L X L zero-matrix (with L denoting the size of the 
column vectors x(«)). <)> 

Proof. Letx^(w) be the predicted value for which the error e_i_(w) satisfies the orthogonality 
condition (8.9), and letx(«) be another predicted value with error e(n) (superscript f and subscript 
TV deleted for simplicity). Now, 

e(«) = x(n) — x(n) = x(n) — x±(n) + xj_(») — x(n), 

so that 

e („) = e± (n) +x±(n) -x(n). (8.10) 

Now, the estimates x±(n) andx(«) are, by construction, linear combinations of x(n — k), 1 < k < 
A''. Because the error ej_(n), by definition, is orthogonal to these past samples, it follows that 

£[e(«)e t (n)\ = E[e ± (n)e^(n)\ +E[(x ± (n) -x(n))(x ± (n) -x~(»)) f ] (8.11) 

call this E call this E ± 
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Because the second term on the right-hand side is a covariance, it is positive semidefmite. The 
preceding therefore shows that 

E - E x > {8.12) 

where the notation A > means that the Hermitian matrix A is positive semidefmite. Taking trace 
on both sides, we therefore obtain 

S>S±. {8.13) 

When does equality arise? Observe first that £ — Ex = Tr(E — Ej_). Because E — Ej_ has been 
shown to be positive semidefmite, its trace is zero only when E — Ej_ = 0. From (8.11), we see 
that this is equivalent to the condition 

E[{xx{n) -x{n)){xx{n) -x(«)) f ] = 0, 

which, in turn, is equivalent to x_l(k) — x{n) = 0. This proves that equality is possible if and only 

ifx(w) =x_l(w). □ 

Optimal Error Covariance. The error covariance E£ for the optimal predictor can be 
expressed in an elegant way using the orthogonality conditions: 

E f N = E[4{n){4{n))^]=E[4{n){x{n) -x{n))"\ = E[4{n)x\n)\. 

The third equality follows by observing that the optimal x{n) is a linear combination of samples 
x{n — k), which are orthogonal to 4{n). Using Eq. (8.4), this can be rewritten as 



n) 



E^ = E [(x(k) + a] vl x(w - 1) + 4a x ( n - 2) ... + 4.N x { n ~ ^)) xt < 
= R(0) + a^R^l) + a^ 2 R f (2) + . . . + a^R f (A0 

Because the error covariance E^ is Hermitian anyway, we can rewrite this as 

E^ = R(0) + R(l)a 7V . 1 + R(2)a N , 2 + . . . + R^a^. {8.14) 

We shall find this useful when we write the augmented normal equations for optimal linear 
prediction. 



The trace of a matrix A is the sum of its eigenvalues. When A is positive semidefmite, all eigenvalues are 
nonnegative. So, zero-trace implies all eigenvalues are zero. This implies that the matrix itself is zero (because any 
positive semidefmite matrix can be written as A — UAU , where A is the diagonal matrix of eigenvalues and U is 
unitary). 
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8.3 NORMAL EQUATIONS: VECTOR CASE 

Substituting the expression (8.4) for the prediction error into the orthogonality condition (8.9), we 
get TV matrix equations. By using the definition of autocorrelation given in Eq. (8.1), we find these 
equations to be 



R(* ) + aj^RC* - 1) + n f N2 R(k - 2) + . . . + aUR(* -N)=0, 



(8.15) 



for 1 < k < N. Here the on the right is an L X L matrix of zeros. Solving these equations, we 
can obtain the TV optimal predictor coefficient matrices &N.k- For convenience, we shall rewrite the 
equations in the form 

R f 0* - l)a^! + R f (£ - 2)a^2 + ■ • ■ + R f (^ - N)* NfN = -& (k) (8.16) 

for 1 < k < N. By using the fact that R (— k) = R(£), these equations can be written in the 
following elegant form: 



R(0) 

R(-l) 



R(l) 
R(0) 



R(-iV+l) R(-N+2) 



R(N-l)' 




a;v.i 




'R(-l)" 


R(N-2) 




*N,2 


= - 


R(-2) 


R(0) 




»JV,JV 




_R(-N)_ 



(8.17) 



These represent the normal equations for the vector LPC problem. Note that these equations reduce 
to the normal equations for scalar processes given in Eq. (2.8) (remembering R' (— k) = R(i)). If 
we move the vector on the right to the left-hand side and then prefix Eq. (8.14) at the top, we get 
the following augmented normal equations for the optimal vector LPC problem: 



R(0) 
R(-D 



R(l) 
R(0) 



R(N) 
R(N- 1 



R(-N) R(-N+ 1) ... R(0) 







" h 




"e 7 ! 


) 




*N,l 


= 









&N,N 








.18) 



We will soon return to this equation and solve it. But first, we have to introduce the idea of 
backward prediction. 

8.4 BACKWARD PREDICTION 

The formulation of the backward predictor problem for the vector case allows us to derive a 
recursion similar to Levinson's recursion in the scalar case (Chapter 3). This also results in elegant 
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lattice structures. The iVth-order backward linear predictor predicts the Ix 1 vectorx(n — N— 1) 
as follows: 

x{n-N-l) = -b] v . 1 x(« - 1) - b] v2 x(» - 2) - ... - b] VA ,x(» - N), 

where b/v.* are L X L matrices. The backward prediction error is 

e h N („) = x (n-N-l) -x(n-N- 1), 

so that 

e h N (n) =x{n-N-l)+ bj v . 1 x(« - 1) + b] v . 2 x(« - 2) + . . . + b] V7V x(« - TV) . 

So the backward prediction polynomial takes the form 

B N ( Z ) = b^z" 1 + bj^z- 2 . . . + bj^z"" + z- {N+1) I. (8.19) 

The optimal coefficients b/v,* are again found by imposing the orthogonality condition (8.9), 
namely, 

E[c° N (n)x\n - k)} = 0, 1 < k < N 

This results in the TV equations 

b] vl R(^ - 1) + b]y i2 R(* - 2) + . . •b] V7V R(^ - JV) + R(* - iV- 1) = (8.20) 

for 1 < k < TV. The error covariance for the optimal backward predictor is 

E^ = E[c h N (n)(c h N (n)y] = E[e h N (n) (x(« - iV- l) - x(« - N- l)) f ] 
= £[eU")(x t («-iV-l)]. 

The third equality follows by observing that the optimal x(n — N — 1) is a linear combination of 
samples x(n — k) that are orthogonal to c N (n). So E w can be rewritten as 



F b — F 



x(n-N-l)+ V Nl x(n - 1) + . . . + W N N x(n -N))x\n-N-1 



>'U 



Using Eq. (8.1) and the facts that E N is Hermitian and R (— k) = R(<£), this can further be 
rewritten as 

E^ = R(-A0b 7V . 1 + . . . + R(-l)b^ + R(0). (8.21) 
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Combining this with the TV equations in Eq. (8.20) and rearranging slightly, we obtain the following 
augmented normal equations for the backward vector predictor: 



R(0) 

R(-l) 



R(l) 
R(0) 



R(-iV) R(-N+l) 



.. R(A0 
..R(N- 1 

.. R(0) 





bjv,i 




' 













ON,N 




E b 
. N . 



[8.22) 



8.5 LEVINSON'S RECURSION: VECTOR CASE 

We now combine the normal equations for the forward and backward predictors to obtain 
Levinson's recursion for solving the optimal predictors recursively. For this, we start from Eq. 
(8.18) and add an extra column and row to obtain 



R(0) 

R(-l) 



R(l) 
R(0) 



R(-N) R(-N+ 1) 
R(-N- 1) R(-N) 



R(iV) R(iV+ 1) 
R(N- 1) R(JV) 



R(0) 

R(-l) 



R(l) 

R(0) 





~ h ' 








»iV,l 


















{8.23) 



call this R 



■N+2 



where Fj^ is an I X I matrix, possibly nonzero, which comes from the extra row added at the 



bottom. Similarly, starting from Eq. (8 



R(0) R(l) . 


. R(JV) R(N+ 1) 


R(-l) R(0) . 


.R(iV-l) R(iV) 


R{-N) R(-N+\) . 


• R(o) R(i) 


R(-JV-l) R(-JV) . 


• R(-i) R(o) J 



22), we add an extra row and column to obtain 










bjv.i 







_ h _ 






E b 



{8.24) 



this is R7V+2 too! 

where F N comes from the extra row added at the top. We now postmultiply Eq. (8.24) by an L X L 
matrix (Kj^, 1 ) ' and add it to Eq. (8.23) with the hope that the last entry Fj^ in the right-hand side is 



Actually L extra columns and rows because each entry is itself an L x L matrix. 
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replaced with zero. This will then lead to the augmented normal equation for the (N-\- l)th-order 
forward predictor. We have 



R 



N+2 



I 


" h ' 









*N,1 


+ 


bjv,i 




&N,N 




DN,N 


V 







_ h _ 



) 




*^N 









(K^ 1 ) t 


= 





+ 





) 








E b 



CO* 



(8.25) 



Clearly, the choice of Kjy +1 , which achieves the aforementioned purpose should be such that 



Fj£ + E^(Kj£ +1 )t=0. 



At this point, we make the assumption that the L x L error covariance E^, and for future purposes, 

f . f 

E N , are nonsingular. The solution K^, +1 is therefore given by 



K +1 = -[F^]- 1 . 



We can similarly postmultiply Eq. (8.23) by an L x L matrix K w+1 and add it to Eq. (8.24): 



■ N 



'.26) 



R 



N+2 



t 


' h ' 




3.JV.1 




&N,N 


\ 






K 7V+1 + 





bjv,i 


) 




^N 




^N,N 
_ h _ 


J 






- r N_ 



Kjvj. i + 



VV+1 



"W. 



'.27) 



To create the augmented normal equation for the (N-\- 1) -order backward predictor, we have to 
reduce this to the form (8.22). For this, K. N+1 should be chosen as 



[E^ +1 +F^ = 0, 



that is, 



K 1 



N+l ~ l^Nl r N- 



'.28) 
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The matrices K^, +1 and K w+1 are analogous to the parcor coefficients k m used in scalar linear 
prediction. With K^ +1 and K 7V+1 chosen as above, Eqs. (8.25) and (8.27) reduce to the form 



R 



■N+2 



I 


" h ' 









&NA 


+ 


W,i 




&N,N 




Ijtv^tv 


V 







. h _ 



) 







K^ +1 ) f 


= 





) 








(8.29) 



R 



■7V+2 



/ 


" h ' 




*N,l 




&N,N 


V 






K 1 



■N+l 



+ 





bjv,i 


^ 










y 






E b 

_ 7V+1 



(8.30) 



h 




' h ' 







&N+1.1 


— 


*NA 


+ 


W,i 


&N+1,N 




&N,N 




^>N,N 


&N+1,N+1_ 









_ h _ 



respectively. These are the augmented normal equations for the (N -\- l)th-order optimal for- 
ward and backward vector predictors, respectively. Thus, the (N -\- l)th-order forward predictor 
coefficients are obtained from 



(d) 1 - 



Similarly, the (N -\- l)th-order backward predictor coefficients are obtained from 



h 

The forward and backward prediction polynomials defined in Eqs. (8.5) and (8.19) are therefore 
given by 



bjv+i,i 




' h ' 


W+1,2 




»au 


N+l.N+1 




&N,N 


h 








K 



7V+1 



+ 



A w+1 (z) = A N (z) + K N+1 B N (z) 



'.31) 
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B,v +1 ( z ) = ^[(K^lMz) + B N (z)}. {8.32) 

The extra z _1 arises because the Oth coefficient in the backward predictor, by definition, is zero 
(see Eq. (8.19)). By comparing Eqs. (8.25) and (8.29) we see that the optimal prediction error 
covariance matrix is updated as follows: 

E^ +1 =E^+F^ +1 )t. (8.33) 

Similarly, from (8.25) and (8.29), we obtain 

E^v+i = E w + b N K N+1 . [8.34) 

By substituting from Eq. (8.26) and (8.28) and using the fact that the error covariances are 
Hermitian anyway, we obtain the error covariance update equations 

E^ +1 = (l L - K^ +1 (K^ +1 )t) E ^ (8.35) 

and 

E^ +1 = (t - (K^ +1 )+K^ +1 ) E^ (8.36) 

Summary of Levinson's Recursion. We now summarize the key equations of Levinson's 
recursion. The subscript m is used instead of N for future convenience. Let R(^) be the L x L 
autocorrelation matrix sequence of a WSS process x(n). Then, the optimal linear predictor 
polynomials Ajv(z) and Btv(z) and the error covariances Ejy and JL N can be calculated recursively 



as follows: first, initialize the recursion according to 



A (z) = I, B (z) = z _1 I, and E 7 = E h = R(0). (8.37(a)) 



Given A m (z), B m (z), E^, and E w (allZ, x L matrices), proceed as follows: 

1. Calculate the L X L matrices 

F^ = R(-m - 1) + R(- W )a„,i + . . . + R(-l)a*,„ (8.37(b)) 

F h m = R(l)b, M + . . . + R(m)b m . m + R(m + 1) (8.37(c)) 

2. Calculate the parcor coefficients (L x L matrices) 



K +1 = -[F^tpE*]- 1 (8.37(d)) 

Kt + i = -VLVtn {8.37(e)) 
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3. Update the prediction polynomials according to 

A m+1 (z) = A ffl (z) + K^ +1 B m (z) {8.37(f)) 

B m+1 (z) = z - 1 [(K b M+1 ) t A M ( Z ) + B m (z)}. (8.37(g)) 

4. Update the error covariances according to 

E£ +1 = (l L - K,{ +1 (K b +1 )t) E,{ [8.37(h)) 

b 



F 



(l L - (K h m+i yK f m+1 ) E h m . (8.370)) 

This recursion can be repeated to obtain the optimal predictors for any order TV. 

8.6 PROPERTIES DERIVED FROM LEVINSON'S RECURSION 

We now derive some properties of the matrices that are involved in Levinson's recursion. This gives 
additional insight and places the mathematical equations in proper context. 

8.6.1 Properties of Matrices F-' and F 

y mm 

Eq. (8.37(b)) can be rewritten as 

(F£)t = R( m + 1)+ a^R( w ) + . . . + al VB R(l) 

= E [(x(«) + a^jx(« - 1) + . . . + &1 m x(n - m)) x\n-m- 1)] 
= £[e^(w)x t (w — m — \)\ 

We know the error e„,(n) is orthogonal to the samplesx(w — 1), . . .,x(« — m), but not necessarily 
to x(n — m — 1). This "residual correlation" between e. m (ri) and x(n — m — 1) is precisely what is 
captured by the matrix F ; {. The coefficient K^ +1 in Eq. (8.37(d)) is therefore called the partial 
correlation or parcor coefficient, as in the scalar case. From Eq. (8.37(d)), we see that if this "residual 
correlation" is zero, then K^ +1 = 0, and as a consequence, there is no progress in the recursion, 
that is, 

A m+1 (z) =A m (z) and E{ +1 = E£ 

How about F m ? Does it have a similar significance? Observe from (8.37(c)) that 

(F b J t = b^RC-l) + . . . + bl tm R(-m) + R(-« - 1) 

= E fb^jx(« - 1) + . . . + b' mm x(n - m) + x(» — m — 1) j x\n) 
= E\^ m {n)x\n)\ 
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The backward error e° m (n) is the error in the estimation of x(? 



1) based on the samples 



x(« — m), . . . ,x(n — 1). Thus, e° n (n) is orthogonal to x(« — m), . . . ,x(« — 1) but not necessarily 
to the sample x(«). This residual correlation is captured by F m . If this is zero, then K m+1 is zero as 
seen from Eq. (8.37(e)). That is, there is no progress in prediction as we go from stage mtom-\-l. 
Relation Between F£ and F m . We have shown that 



(F'J = E[ef(n)x\? 



1)] 



and 



(F*Jt = E[e b m (n)x\n)}. 



(8.38) 



(8.39) 



For the scalar case, the parameter a m played the role of the matrices F^ and F m . But for the 
vector case, we have the two matrices Ff n and F m , which appear to be unrelated. This gives the 
first impression that there is a lack of symmetry between the forward and backward predictors. 
However, there is a simple relation between F^ and F m credited to Burg (Kailath, 1974); namely, 

t, = (F,{) f . (8.40) 

This is readily proved as follows: 

Proof of Eq. (8.40). We can rewrite Eq. (8.38) as 



(F^ 



i£(n) (x(n -m-l)+ b^x(« - 1) + . . . + b* >IB x(n - m)j 



EH(n)(e h m (n)V 



because E[em(n)x(n — k)] =0 for 1 < k < m anyway. Similarly, from Eq. (8.39), we get 



(K) 



Mt 



«£(«) H n ) + a Li x ( w -!) + ••• + 4,„*(» 



= £[ei(»)(e/(»))t]. 

So we have shown that 

F,{ = E[J m (n)(*£(»))1] and F* = E[4(n)(^ m (n))% 
which proves (8.40). 



given in Eqs. (8.37(d)) and (8.37(e)), we have 






.41) 

D 

Ss+l 



K^E 15 and F b 



-F ^K"* 3 
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In view of the symmetry relation (8.40), it then follows that 

K f m+1 E h m = EfK h m+1 . (8.42) 

8.6.2 Monotone Properties of the Error Covariance Matrices 

We now show that the error covariances of the optimal linear predictor satisfy the following: 

E,{>E^ +1 and E h m > E h m+V (8.43) 

that is, the differences E^ — E m+l and E m — E m+1 are positive semidefmite. Because the total 
mean square error Sm is the trace of Ej n , Eq. (8.43) means in particular that 

ei>eU and £ l> £ l + i i 8 - 44 ) 

Proof of (8.43). Substituting (8.37(d)) into Eq. (8.33), we see that E f m+l = E f m - 
F m (E m ) _1 F^. In view of the symmetry relation (8.40), it then follows that 



772 + 1 



(F„{)t(E b J- 1 F/ (8.45) 



Because (E m ) is Hermitian and positive definite, it follows that the second term on the right-hand 

f 



side is Hermitian and positive semidefmite. This proves Ef n > E^ +1 . Similarly, substituting Eq. 



(8.37(e)) into Eq. (8.34), we get E h m+1 = E h m - F^E^T^, from which it follows that 

E h m+1 =E h m -F£(E')-\F^. (8.46) 

This shows that E m > E m+1 and the proof of Eq. (8.43) is complete. □ 

8.6.3 Summary of Properties Relating to Levinson's Recursion 

Here is a summary of what we have shown so far: 

1. F^ and F m have the following interpretations: 

(EfJ = E[cl(ny\n -m-l)], (F b J t = E[e h m (n)x\n)}. (8.47(a)) 

2. We can also express these as 

Ff = E[e h m (n)(ef(n))l] and F b m = E[ef(n)(e h m (n))% (8.47(b)) 

so that 

Ft = (F^)t. (8.47(c)) 
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3. The parcor coefficients are related as 

K{ +1 E b m = E/K b m+1 . {8.47(d)) 

4. The error covariances matrices satisfy 

E^>E^ +1 , E b M >E b m+lJ {8.47(e)) 

where A > B means that A — B is positive semidefinite. 

5. Traces of the error covariances satisfy £m > &m+i ano - ^m — ^m+V 

6. For completeness, we list one more relation that will only be proved in Section 8.8.1. The 
error covariances for successive prediction orders are related as 

{E f m+l f 2 = {E^ 2 (l L - P m+1 Pl +1 ) V2 {8.47(f)) 

(E b m+1 ) 1/2 = (E b ) 1/2 (l L - Pi +1 P m+1 ) V2 , {8.47(g)) 

where 

P m+1 = {V£)- 1/2 K{, +1 K) 1/2 = (E„{) t/2 K b +1 (E b )-t/^ {8.47(h)) 

and the matrices P„, satisfy 

P m Pl<I, PlP m <I. {8.47(i)) 

8.7 TRANSFER MATRIX FUNCTIONS IN VECTOR LPC 

The forward prediction polynomial (8.5) is a multi-input multi-output (MIMO) FIR filter, which 
produces the output c N {n) in response to the input x(«). Similarly backward prediction polynomial 
(8.19) produces c N {n) in response to the input x{n). These are schematically shown in Fig. 8.1(a). 
It then follows that the backward error c N {n) can be generated from the forward error c^{n) as 
shown in part (b) of the figure. We therefore see that the transfer function from e^{n) to e%{n) is 
given by 



z- 1 U N {z)=B N {z)A], 1 {z) 



A modification of this is shown in part (c) where the quantities gjy{n) and g b v («) are given by 
gfa) = {E^ 2 4{n), £ N {n) = {E h N )^ 2 c h N {n) {8 
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FIGURE 8.1: (a) Generation of the forward and backward errors using the prediction polynomials 



Atv(z) and Bjv(z), (b) generation of the backward error c N (n) from the forward error t N {n), and (c) 



generation of the normalized backward error gh(n) from the normalized forward error gjy(«) 



These are called normalized errors because their covariance matrices are equal to I. The transfer 
function from % N (n) to g^(«) is given by 



z- l G N (z)^El\- l l 2 K N {z)A~\z)[E f N YI 2 . 
The properties of Gtv(z) will be studied in later sections. 



(8.50) 



8.8 THE FIR LATTICE STRUCTURE FOR VECTOR LPC 

The update equations for predictor polynomials, given in Eqs. (8.37(f)) and (8.37(g)), can be used 
to derive lattice structures for vector linear prediction, similar to the scalar case. Recall that the 
error Cm(n) is the output of A m (z) in response to the input x(»), and similarly, e m («) is the output 
of B m (z). Thus, the update Eqs. (8.37(f)) and (8.37(g)) show that the prediction errors for order 
m + 1 can be derived from those for order m using the MIMO lattice section shown in Fig. 8.2. 

Comparing this with the lattice structure for the scalar case (Fig. 4.5), we see that there is 
a great deal of similarity. One difference is that, in the scalar case, the lattice coefficients were 
k m+ \ and its conjugate k* J+ \. But in the MIMO case, the lattice coefficients are the two different 
matrices K f m+1 and (K^ +1 ) f . 
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■> — ^ ► -j* ► e/./»; 

•x. s— *■ m + 1 1 ' 




ffi+1' ' 



FIGURE 8.2: The MIMO lattice section that generates prediction errors for order m + 1 from 
prediction errors for order m. 



8.8.1 Toward Rearrangement of the Lattice 

We now show how to bring more symmetry into the lattice structure by rewriting equations a little 
bit. Recall that the coefficients K^ +1 and K m , j are related as in Eq. (8.42). We first rearrange this. 
Because error covariances are positive semidefinite matrices, they can be factored into the form 

E = E 1/2 E t/2 , 

1 10 \ 10 

where E ' can be regarded as the left 'square root and its transpose conjugate E ' the right 
square root. Using this notation and remembering that the error covariances are assumed to be 
nonsingular, we can rewrite Eq. (8.42) as 



(e^-^kOe^ 



(F f )^ 2 K h (F h Y 
We will indicate this quantity by the notation P„,+i , that is, 



t/2 



m+i 



(E/)-^Ki +1 (E b jV2 



(E,{) t/2 Kt +1 (E b J"t/ 2 , 



that 



K 



m +1 



(Ef)^P m+1 (E h m )-V 2 and K b = (E/)- t/2 P„ +1 (E 



Mt/2 



From the definition of P m +i , we see that 



P P^ 



pf p 



f\ V2 



+l r m+1 



E£)- 1/2 (Ki' +1 )(K b +1 )t(E 



«r 1/2 K +1 M +1 (0 1/2 - 



.51) 



.52) 



(8.53) 



.54) 



132 THE THEORY OF LINEAR PREDICTION 

To understand the role of these matrices, recall the error update Eq. (8.35). This can be 
rewritten as 



E m +i — (If. K»»+i(Km+i) J E m 



(E/)V2 - K f m+1 (Kl +l )\Efy/ 2 ) (E/)t/^ 



that 



(E/)^(i L _ Pm+1 pt !+i ) (E /)t/2. {8 . 55) 






Similarly, from Eq. (8.36), we have 



% - (Kl +1 )W +1 ) (Ey/\El)^ 



(E h m y/ 2 - (Ki +1 yKf(E h m y/ 2 ) (e 



rMt/2 



(E b j i/2 (h - &r y \< + .y<A<) 1 ' 2 ) (E b j t/2 , 



E b m+1 = (E h m y 2 (l L - Pl +1 P m+1 ) (E b J t/2 . (8.56) 



that 



Thus, the modified lattice coefficient P„,+i enters both the forward and backward error updates in 
a symmetrical way. Under the assumption that the error covariances are nonsingular, we now prove 
that P m +i satisfies a boundedness property: 



and similarly 



P m+1 Pl +1 < I (8.57) 

Pl +1 P m+1 < I- (8.58) 



These can equivalently be written as 

(I - P„ + iPl +1 ) > 0, (I-Pl +1 P m+1 )>0. (8.59) 

Proof. Eq. (8.55) can be written as 

E^UtQU, 
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where Q_is Hermitian and U is nonsingular. Because E^ +1 is assumed to be nonsingular, it is 

positive definite, so that v 1 'E^ +l v > for any nonzero vector v. Thus, v'U QUv > 0. Because 

U = (E„) ' is nonsingular, this implies that w' Qw > for any vector w/0, which is equivalent 

to the statement that Q_is positive definite. This proves that I — ¥ m +l* m +i ls positive definite, 

which is equivalent to Eq. (8.57). Eq. (8.58) follows similarly. □ 

f 
Because E^ +1 is Hermitian and positive definite, it can be written in the form 

E 7 +1 =(E / +1 ) 1 / 2 (E / +1 )t/2 

m-\-l V m-\-U V m-\-U 

From Eq. (8.55), we therefore see that the left square root can be expressed as 

(E^) 172 = (E^) 1/2 (iz - P m+1 Pl +1 ) V2 • (8.60) 

The existence of the square root (If, — P m +iP„, +1 ) ' is guaranteed by the fact that (I/. — 
Pm+iP^+i) is positive definite (as shown by Eq. (8.59)). Similarly, from Eq. (8.56), we have 

(E b m+1 ) V2 = (E b J 1/2 fix - Pl +1 P m+i y /2 (8.61) 



8.8.2 The Symmetrical Lattice 

The advantage of defining the L X L matrix P„,+i from the L X L matrices K^ +1 and K m+1 is 
that it allows us to redraw the lattice more symmetrically, in terms of P m +i- Because P m +i also has 
the boundedness property (8.57), the lattice becomes very similar to the scalar LPC lattice, which 
had the boundedness property \k m \ < 1. We now derive this symmetrical lattice. From Eq. (8.51), 
we see that the lattice coefficients can be expressed as 

K{ +1 = (E^P^E^)-^ {8 .62) 

and 

K b m+1 = (E„{)-t/2p m+1 (E b Jt/ 2 . (8.63) 

Substituting these into Fig. 8.2 and rearranging, we obtain the lattice section shown in Fig. 8.3(b). 
The quantities g£(n) and g (n) indicated in Fig. 8.3(b) are given by 

g£(») = (E f m )~ 1/2 cf(n), g b » = (E b J^/2 e b {n) {864) 

These are normalized prediction error vectors, in the sense that their covariance matrix is 
identity. We can generate the set of all forward and backward normalized error sequences by using 
the cascaded FIR lattice structure, as demonstrated in Fig. 8.4 for three stages. 



e f (n) 
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► ^ ► 



(a) 




w+l' ' 



ffi+1' ' 



e f (n) 



(b) 



e (n) 




m+V y 



w+l' 



FIGURE 8.3: (a) The original MIMO lattice section and (b) an equivalent section in a more symmetrical 
form. 



The multipliers Q_^ in the figure are given by 

Qi=(E£)- 1/2 (E£ 1 ) 1 /2 ) w >! (A<J5) 

and Q_q = (Eg )~ ' . The multipliers Q_ m are similar (just replace superscript _/with £ everywhere). 
In view of Eqs. (8.60) and (8.61), these multipliers can be rewritten as 



Qi=( 



1 7". ~~ PmP. 



-1/2 



. Q^=(iz.-p1p 



-1/2 



OT > 1. 



.66) 



The unnormalized error sequence for any order can be obtained simply by inserting matrix 
multipliers (E;;) 1 / 2 and (E m ) 1//2 at appropriate places. Because 

e 3 / («) = (E3WW and e$(„) = (E^g^), 

the extra (rightmost) multipliers in Fig. 8.4(a) are therefore (E3) 1 / 2 and (E3) 1 / 2 . The lattice 
sections in Fig. 8.4 will be referred to as normalized lattice sections because the outputs at each of 
the stages represent the normalized error sequences. 
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FIGURE 8.4: (a) The MIMO FIR LPC lattice structure with normalized sections and (b) details of 
the box labeled P„. 



8.9 THE IIR LATTICE STRUCTURE FOR VECTOR LPC 

The IIR LPC lattice for the vector case can be obtained similar to the scalar case by starting from 
Fig. 8.2 reproduced in Fig. 8.5(a) and rearranging equations. From Fig. 8.5(a), we have 



/ 



and 



ei +1 («)=e„{(«) + Ki +1 e b J 



e h m+1 (n + l) = [K h m+1 ]^f(n)+ei(n). 



The first equation can be rewritten as 

e,{(«)=e{ +1 («)-K£ +1 e». 
Substituting this into the second equation, we get 



lt„/ 



It*-/ 



e b m+1 (n + 1) = [K b m+1 ]^ m+1 (n) + (I - K, + ^K + ,) <(»)• 

These equations can be "drawn" as shown in Fig. 8.5(b). By repeated application of this idea, we 
obtain the IIR LPC lattice shown in Fig. 8.6. This is called the MIMO IIR LPC lattice. The 
transfer functions indicated as H^(z) in the figure will be discussed later. 
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FIGURE 8.5: (a) The MIMO FIR lattice section and (b) the corresponding IIR lattice section. 
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FIGURE 8.6: (a) The MIMO IIR cascaded LPC lattice, and (b) details of the building block labeled 

Km- 
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8.10 THE NORMALIZED IIR LATTICE 

Starting from the symmetrical MIMO FIR LPC lattice in Fig. 8.3(b), we can derive an IIR LPC 
lattice with slightly different structural arrangement. We will call this the normalized IIR LPC 
lattice for reasons that will become clear. To obtain this structure, consider again the MIMO FIR 
lattice section shown separately in Fig. 8.7(a). This FIR lattice section can be represented using the 
equations 



/ 



and 



gi +1 («) = Qi +1 (g£M + P m+ ig b ») 



g h m+1 (n + 1) = a b +1 (pLhs^m + g b » 



The first equation can be rearranged to express g-' (n) in terms of the other quantities: 

g*(») = (Q.{,+i)' 1 gi+i(n) -P m +ig b „( K )- 
The second equation (8.68) can then be rewritten by substituting from Eq. (8.69): 



g b :(« + i) = a b +1 pi + i(a, 



/ W„/ 



+!*■ m+l\*<~m+lJ 5m+l> 



+ (Qi +1 r T g», 



(a) 




► g Jn) 

m +1 



g Jn) 

m +1 



g (n) 



(b) Qi + ,pi +1 (Q / . +1 )" 1 



H> — 



-► g„,N 



%° Jn) 

m +1 



z-'l 



(QJUi)~ 



p 

-^ 



St(n) 



(8.67) 



(8.69) 



(8. 70) 



FIGURE 8.7: (a) The normalized MIMO FIR lattice section and (b) the corresponding IIR normalized 
lattice section. 
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where we have used the definitions (8.66) for Q_^ and Q_ , that is, 



/. 



-1/2 



-1/2 



oi=(fe-pjpj,j , a: = (^-PLP m j «>i. 

The preceding two equations can be represented using the normalized IIR lattice section shown in 
Fig. 8.7(b). 

The normalized MIMO IIR LPC lattice therefore takes the form shown in Fig. 8.8 for three 
sections. For an iV-stage IIR lattice, the signal entering at the left is the forward error c N (n), and 
the signal at the right end is g/ («). Upon scaling with the multiplier 



(CU 



A-i 



(E V 2 , 



Ji 



J\ 



the error signal g/ («) becomes e/ («), which is nothing but the original signal x(«). Thus, while 
the FIR lattice produced the (normalized) error signals from the original signal x( n) , the IIR lattice 
reproduces x(«) back from the error signals. 

8.11 THE PARAUNITARY OR MIMO ALL-PASS PROPERTY 

Figure 8.9 shows one section of the IIR lattice structure. Here, G m (z) is the transfer function from 
the pervious section, and P m +i is the constant matrix shown in Fig. 8.8(b). Comparing with Fig. 
8.8(a), we see that the transfer matrices G m (z) convert the forward error signals to corresponding 
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FIGURE 8.8: (a) The normalized MIMO IIR LPC lattice shown for three stages and (b) details of 
the normalized IIR lattice sections. Here, Q_£ = (1 L - P,,^)" 1 / 2 , Q_t = (U - V ] m V m )- 1 ' 2 , m > 1, 
andQi=(E / )- 1 /2. 
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FIGURE 8.9: The normalized MIMO IIR lattice section. 



backward error signals. More precisely, if g-' («) is input to the system G m (z), then the output is 

In the scalar case, the transfer function G m (z) was all-pass, and this was used to show that 
G m+ \(z) was all-pass. Furthermore, these all-pass filters had all their poles inside the unit circle 
(Section 4.3.2). We now prove an analogous property for the vector case. This is somewhat more 
complicated than in the scalar case because the transfer functions are MIMO, and the all-pass 
property generalizes to the so-called paraunitary property to be defined below (Vaidyanathan, 
1993). We will first show that the building block indicated as P m +i has a unitary transfer matrix 
T m+ i. We then show that in an interconnection such as this, whenever G m (z) is paraunitary 
and P m +i unitary, then G m +i(z) is paraunitary as well. Because Go(z) = I is paraunitary, it will 
therefore follow that all the transfer functions G,„(z) in the normalized IIR lattice are paraunitary. 
At this point, we ask the reader to review the tilde notation G(z) described in Section 1.2.1. 

Definition 8.1. Paraunitary Systems. An M X M transfer matrix G(z) is said to be 
paraunitary if 



G t (e^)G(e^) 



(8.71) 



for alia;. That is, the transfer matrix is unitary for all frequencies. If G(z) is rational (i.e., the entries 
Gk m (z) are ratios of polynomials in z ) then the paraunitary property is equivalent to 



G(z)G( 



for all z. 



(8.72) 



With x(n) and y(n) denoting the input and output of an LTI system (Fig. 8.10), their Fourier 
transforms are related as 

Y(e^) = G(e JUJ )X(e JUJ ) 
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x(n) 



G(z) 



y(n) 



FIGURE 8.10: An LTI system with input x(ra) and output y(«). 



A number of important properties of paraunitary matrices are derived in Vaidyanathan (1993). The 
following properties are especially relevant for our discussions here: 



1. If G(e JLU ) is unitary, it follows that 

Y f (e JU )Y(e Jw ) = X f (e jw )X(e JU ) , for all u 



(8.73) 



for any Fourier transformable input x(k). 

2. It can, in fact, be shown that a rational LTI system G(z) is paraunitary if and only if (8.73) 
holds for all Fourier-transformable inputs x(n). 

3. By integrating both sides of (8.73) and using Parseval's relation, it also follows that, for a 
paraunitary system, the input energy is equal to the output energy, that is, 



^~V(h)x(k) = ^y f («)yO) 



.74) 



for all finite energy inputs. 



8.11.1 Unitarity of Building Blocks 

The building block P m +i in Fig. 8.9 has the form shown in Fig. 8.8(b). Its transfer matrix has the 
form 



■Q>t (Q /)-l (Q_b)-t 

(a 7 )- 1 -p 



.75) 



where all subscripts have been omitted for simplicity. Recall here that the matrices Q_ and Q/ are 
related to P as in Eq. (8.66), that is, 



-1/2 



We will now show that T is unitary, that is, T T = I 



b <I L P f P 



-1/2 



.76) 
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ProofofUnitarityofT. We have 



T t T 



A B 
B f D 



where the symbols above are defined as 

A = (o/r^aVoy (o/r 1 + (Q/rWr 1 , 
B = (o/y^((t)\(t)^ - (Q/r f p = o, 

D = (Q^-^Q^-t+ptp. 



So, B is trivially zero. To simplify D, we substitute from Eq. (8.76) to obtain 



eW ! a, 



\ t/2 + 
PP) +P f P 



PP +PP 



Next, substituting from Eq. (8.76), the first term of A becomes 



( 



II 



L>t/2 

PPM PI, 



pp i 



t/2 



PI 



PP 



PP 



-t/2 



PP 



-1/2 



P T I 



ppi 



1/2 



PI 



PP 



1/2 



that 



A 



h. ~ PP 1 



t/2 



P Ii-FP P f + I 



PP 



1/2 



call this Ai 



We now claim that Ai = (If, - PP 



U-l. 



>t 



)tuijt 



Ai(I L - PP f ) = P [I L - P f Pj [P 1 - P T PP T J + [I L 

= v(i L - p f p) l (l L - p f p) P f + (l L 

= PP f + fli-PP 1 ') =If, 



pp t 
ppt 



Thus, A = (I £ - PP f )t/ 2 (I x - PP^-^If, - Ppt)V2 = l ; w here we have used (If, - PP f ) = 
(I L — PP t ) 1 / 2 (I £ — PP t ) 1 '/ 2 . This completes the proof that T is unitary. □ 

8.11.2 Propagation of Paraunitary Property 

Return now to Fig. 8.9. We will show that if the rational transfer matrix G,„(z) is paraunitary and 
the box labeled P m +i has a unitary transfer matrix T m+ i (z) then G m+ i (z) is also paraunitary. 
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Proof of Paraunitarityof G m+ \(z). From Fig. 8.9, we have 



Y(e- 
Y 2 (e^ 



■ m-\-\ 



X!(e 

X 2 (e 



ju\ 



The unitary property of T m +\ implies that 

Y\ (e *> )Yi (e juJ ) + Y\(c j " )Y 2 (e ^ ) 



= Xj (e ■'" )Xi (e J " ) + X^ (e ^ )X 2 (e ^ ) , for all lu . 

Because G m (z) is paraunitary, we also have Yl(e JUJ )Y 2 (e JUJ ) = X^(e^)X 2 (e ; ") for all u. So it 
follows that 



Y\(e J ' u )Yi(e J ' w ) = X\(e ju )X 1 (c J ' u ), for all w. 
This shows that the rational function G m+ i(z) is paraunitary. 



□ 



8.11.3 Poles of the MIMO HR Lattice 

Figure 8.11 is a reproduction of Fig. 8.9 with details of the building block P„,+i shown explicitly 
(with subscripts deleted for simplicity). Here, the matrices Q/ and Q_ are as in Eq. (8.66), that is, 

-V2 . / , n-1/2 



Q/ = (I L - PPM and Q_ D = ll L - P T P 

In Section 8.8.1, we showed that the LPC lattice satisfies the property 

P f P <I 



.77) 



(see Eq. (8.58)). Under this assumption, we will prove an important property about the poles 3 of 
the transfer matrix G m+ \(z). 

Lemma 8.1. Poles of the IIR Lattice. If the rational paraunitary matrix G m (z) has all poles in 
\z\ < 1 and if the matrix multiplier P is bounded as in Eq. (8.77), then the poles of G m +i(z) are 
also confined to the region |z| < 1. <0> 

Proof. It is sufficient to show that the transfer function F(z) indicated in Fig. 8.11 has all 
poles in |z| < 1. To find an expression for F(z), note that 

Y(z) = z" 1 G ;K (z)W(z) and W(z) = -z" 1 PG m (z)W(z) +X(z), 

The poles of a MIMO transfer function F(z) are nothing but the poles of the individual elements F/, m (z). For 
this section, the reader may want to review the theory on poles of MIMO systems (Kailath, 1980; Vaidyanathan, 
1993, Chapter 13). 
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(Q 7 )"' x(n) w(n) 

H> ►■ 




FIGURE 8.11: The normalized MIMOIIR lattice section with some of the details indicated. Subscripts 
on P and Q_ are omitted for simplicity. 



where w(w) is the signal indicated in the figure. Eliminating W(z), we get Y(z) = z G m (z)(I + 
z" 1 PG m (z))" 1 X(z). This shows that 



F(z) = z-'G^il + ^PCW)" 1 . 



(8. 78) 



Because the poles of G m (z) are assumed to be in \z\ < 1, it suffices to show that all the poles of 
(I + z _1 PG m (z)) _1 are in |z| < 1. Because 



ri . - lpr , x r i Adj^ + z^PG^z)) 

(I + z PG m (z)) = -T-77— -TST^TTTV 



det(I + z- 1 PG m (z)) ' 

where Adj is the adjugate matrix, it is sufficient to show that all zeros of the denominator above are 
in |z| < 1. We will use Eq. (8.77) and the assumption that G m (z) is paraunitary with all poles in 
|z| < 1. The latter implies 



Gl(z)G m (z) < I 



.79) 



for all z in |z| > 1 (see Section 14.8 of Vaidyanathan, 1993). Now, if zq is a zero of the determinant 
of (I + z _1 PG m (z)), then I + Zq PG m (zo) is singular. So there exists a unit-norm vector v such 
that Zq PG m (zo)v = —v. This implies 

v t Gl(zo)P t PG m (z )v= |z | 2 . (8.80) 

If |zo| > 1, then G m (zo)v has norm < 1 (in view of Eq. (8.79)). So the left-hand side in Eq. (8.80) 
is < 1 in view of (8.77). This contradicts the statement that the right-hand side of Eq. (8.80) 
satisfies |zo| > 1. This proves that all zeros of the said determinant are in |z| < 1. □ 

By induction, it therefore follows that the MIMO IIR LPC lattice is stable. Summarizing, we have 
proved the following: 
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Theorem 8.2. Stability of the IIR Lattice. Assume all the matrix multipliers indicated as P m 
in the MIMO IIR lattice (demonstrated in Fig. 8.8 for three stages) satisfy P„P m < I and define 
Q_ ; { and Q_ m as in (8.66). Then, the poles of all the paraunitary matrices G m (z) in the LPC lattice 
are confined to be in |z| < 1. (} 

Equivalently, all poles of the MIMO IIR LPC lattice are in \z\ < 1 (i.e., the IIR lattice is a causal 
stable system). This is the MIMO version of the stability property we saw in Section 4.3.2 for the 
scalar LPC lattice. 

8. 12 WHITENING EFFECT AND STALLING 

Consider again Levinson's recursion for the MIMO linear predictor described in Section 8.5. 
Recall that the error covariances are updated according to Eqs. (8.37(h)) and (8.37(i)), which can 
be rewritten using (8.37(d)) and (8.37(e)) as follows: 

E^ +1 =E^-(F^)t(E^)- 1 F^ (8.81) 

E^Ek-Fj&Ej^CFjfrt. (8.82) 

f f 

We say that there is no progress as we move from order iVto order N-\- 1, if £jfu_ 1 = £jyj that is, 

TrE^ +1 = TrE^. (8.83) 

We know that the optimal prediction e^(n) at time n is orthogonal to the samples x(n — k), for 
1 < k < N. We now claim the following: 

Lemma 8.2. No-Progress Situation. The condition (8.83) arises if and only if 

(F^E[e^(n)x\n-N- 1)] = (8.84) 

that is, the error c N (n) is orthogonal to x(n — N — 1) in addition to being orthogonal to 
x(n-l),...,x(n-N). 

Proof. Eq. (8.83) is equivalentto Tr ((F^)t(E^)" 1 F^) = 0, as seenfrom Eq. (8.45). Because 
the matrix (F^^E^) F^, is Hermitian and positive semidefinite, the zero-trace condition is 
equivalent to the statement 4 

(F^)t(E^)- 1 (F^)=0. 

The trace of a matrix A is the sum of its eigenvalues. When A is positive semidefinite, all eigenvalues are 
nonnegative. So, zero-trace implies all eigenvalues are zero. This implies that the matrix itself is zero (because any 
positive semidefinite matrix can be written as A = UAU T where A is the diagonal matrix of eigenvalues and U 
unitary). 
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This is equivalent to Fjy = (because (Ejy) - is Hermitian and nonsingular). This completes the 
proof. □ 

A number of points should be noted. 

1. From Eqs. (8.82) and (8.84), we see that when there is no progress in forward prediction, 
then there is no progress in backward prediction either (and vice versa). 

2. When Eq. (8.84) holds, we have K^ +1 =0 (from (8.26)) so that A N+1 (z) =A N (z). 
Similarly K N+1 = andB7v +1 (z) = z~ 1 Bj V (z). 

We say that linear prediction "stalls" at order N if there is no further progress after stage N, 
that is, if 

£~N = '-tv+i = &N+2 = • • ■ ■ \°- °5) 

We now prove the following: 

Lemma 8.3. Stalling and Whitening. The stalling situation (8.85) occurs if and only if 

E[4{n){4)\n-m)]=E f N 5{m) 1 (8.86) 

r 

that is, if and only if c N (n) is white. (} 

Proof. From Lemma 8.2, it is clear that stalling arises if and only if 

F,{=0,t«> N, 

or equivalently, 

E[e£(n)x\n - m)] = 0, m > N+ 1. 

Because this equality holds for 1 < m < TV directly because of orthogonality principle, we can say 
that stalling happens if and only if 



E[4{n)^(n - m)} = 0, m > 1. (8.87) 



J 



But c N (n) is a linear combination of x(n — m), m > 0, so the preceding equation implies 

E[4(n)(4)\n-k)]=Q, k>l. (8.88) 

Conversely, because x(n) is a linear combination of 4( n — k),k>Q,vfe. see that Eq. (8.88) implies 
Eq. (8.87) as well. But because (8.88) is equivalent to Eq. (8.86), we have shown that the stalling 
condition is equivalent to Eq. (8.86). □ 
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8.13 PROPERTIES OF TRANSFER MATRICES IN LPC THEORY 

In the scalar case, the properties of the transfer functions Aj^(z) and B^/[z) are well understood. For 
example, y^7v(z) has all zeros strictly inside the unit circle (i.e., it is a minimum-phase polynomial). 
Furthermore, we showed that 

B N (z) = z~^A N (z). 

This implies that B^(z) has all its zeros outside the unit circle and the filter B^(z) /A^(z) is stable 
and all-pass. That is, the forward and backward error sequences are related by an all-pass filter, 
so that £]y = £ N . For the case of vector linear prediction, it is trickier to prove similar properties 
because the transfer functions are matrices. Let us first recall what we have already done. The 
transfer matrix Htv(z) from the prediction error e^,(«) to the prediction error e^,(«) is given in Eq. 
(8.48) and is reproduced below: 

z- l U N {z)=K N {z)\- N \z). {8.89) 

The transfer matrix z Gtv(z) from the normalized prediction error % N (n) to the normalized 
prediction error g N (n) is given in Eq. (8.50) and reproduced below: 

z- 1 G 7 v(z) = [E^]- 1 / 2 B 7V (z)A^(z)[E^] 1 / 2 . (8.90) 

This also appears in the MIMO IIR lattice of Fig. 8.8(a) (indicated also in Fig. 8.11). We showed 
in Section 8.11.2 that Gtv(z) is paraunitary, which is an extension of the all-pass property to the 
MIMO case. We also showed (Section 8.11.3) that all poles of Gtv(z) are in |z| < 1. 

We will start from these results and establish some more. The fact that the poles of Gtv(z) 
are in |z| < 1 does not automatically imply that the zeros of the determinant of A^v(z) are in 
| z| < 1 unless we establish that the rational form in Eq. (8.90) is "irreducible" or "minimal" in some 
sense. We now discuss this issue more systematically. We will also establish a relation between the 
forward and backward matrix polynomials A^y(z) and Btv(z). 

8.13.1 Review of Matrix Fraction Descripions 

An expression of the form 

H(z) =B(z)A _1 (z), (8.91) 

where A(z) and B(z) are polynomial matrices is called a matrix fraction description (MFD). The 
expressions (8.89) and (8.90) for z _1 Htv(z) and z~ 1 Gt V (z) are therefore MFDs. We know that the 

In this section, all polynomials are polynomials in z~ . 
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rational expression H(z) = B(z)/A(z) for a transfer function is not unique because there can be 
cancellations between the polynomials A(z) and B(z). Similarly, the MFD expression for a rational 
transfer matrix is not unique. If there are "common factors" between A(z) and B(z), then we can 
cancel these to obtain a reduced MFD description for the same H(z). To be more quantitative, 
we first need to define "common factors" for matrix polynomials. If the two polynomials A(z) and 
B(z) can be written as 

B(z) = Bi(z)R(z) and A(z) = A : (z)R(z) 

where Bi(z), Ai(z), and R(z) are polynomial matrices, then we say that R(z) is a right common 
divisor or factor between A(z) and B(z), abbreviated as red. In this case, we can rewrite Eq. (8.91) 

as 

H(z)=B 1 (z)A~ 1 (z). (8.92) 

Because 

detA(z) = detAi(z)detR(z), 

it follows that detAi(z) has smaller degree than detA(z) (unless detR(z) is constant). Thus, 
cancellation of all these common factors results in a reduced MFD (i.e., one with smaller degree 
for detA(z)). We say that the MFD (8.91) is irreducible if every red R(z) has unit determinant, 

detR(z)=l, (8.93) 

or more generally, a nonzero constant. A matrix polynomial R(z) satisfying Eq. (8.93) is said to 
be unimodular. Thus, an MFD (8.91) is irreducible if the matrices A(z) and B(z) can have only 
unimodular reds. In this case, we also say that A(z) and B(z) are right coprime. 

Any unimodular matrix is an red. GivenA(z) and B(z), we can always write 

A(z) = A(z)U _1 (z) xU(z), B(z) = B(z)U~ 1 (z) xU(z) 
polynomial Aj(z) polynomial B x (z) 

for any unimodular polynomial U(z). The inverse U (z) remains a polynomial because detU(z) = 
1. Thus, Ai(z) and B^(z) are still polynomials. This shows that any unimodular matrix can be 
regarded as an red of any pair of matrix polynomials. □ 

The following results are well-known in linear system theory (e.g., see Lemmas 13.5.2 and 
13.6.1 in Vaidyanathan, 1993). 

Lemma 8.4. Irreducible MFDs and Poles. Let H(z) = B(z)A~ (z) be an irreducible MFD. 
Then, the set of poles of H(z) is precisely the set of zeros of [detA(z)]. <0> 
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Lemma 8.5. Irreducible and Reducible MFDs. Let H(z) = B(z)A~ (z) be an MFD and 

H(z) = B re a(z)A~ d (z) an irreducible MFD forthe same systemH(z). Then, B(z) = B re d(z)R(z) 
and A(z) = A re a(z)R(z) for some polynomial matrix R(z). "0 

8.13.2 Relation Between Predictor Polynomials 

We now prove a crucial property of the optimal MIMO predictor polynomials. The property 
depends on the vector version of Levinson's upward recursion Eqs. (8.37(f))— (8.37(g)). First note 
that we can solve for A m (z) and B„,(z) in terms of A m+ i(z) and B m+ i(z) and obtain 

A m (z) = (i - K£ +1 (K^ +1 )t) _1 ( Am+1 (z) - zK f m+1 B m+1 (z)) (8.94) 



B m (z) = (i - (K^ +1 )t K £ +1 ) (zB m+1 ( z ) - (Kl +1 )^A m+1 ( z )) . (8.95) 

This is called the downward recursion because we go from m + 1 to m using these equations. 
The inverses indicated in Eq. (8.94) and (8.95) exist by virtue of the assumption that the error 
covariances at stage m and m + 1 are nonsingular and related as in Eqs. (8.35) and (8.36). 

Lemma 8.6. Coprimality of Predictor Polynomials. The optimal predictor polynomials A^v(z) 
and Btv(z) are right coprime for any N, so that the MFD given by B^(z)A. N (z) is irreducible. <0 

Proof. If R(z) is an red of A m+ i(z) and B m+1 (z), then we see from Eqs. (8.94) and (8.95) 
that it is also an red of A m (z) and B m (z). Repeating this argument, we see that any red R(z) of 
Atv(z) andBTv(z) foriV> must be an red of Ao(z) andB (z). ButA)(z) = I andBo(z) = z~ I, 
so these have no red other than unimodular matrices. Thus, R(z) has to be unimodular, or 
equivalently, A^v(z) and Btv(z) are right coprime. □ 

We are now ready to prove the following result. The only assumption is that the error covariances 
Ej(, and E^y are nonsingular (as assumed throughout the chapter). 

Theorem 8.3. Minimum-Phase Property. The optimal predictor polynomial matrix Atv(z) is 
a minimum-phase polynomial, that is, all the zeros of [detA^z)] are in |z| < 1. <0> 

Proof. From Eq. (8.90), we have 

B 7V (z)A^ 1 (z) = z- 1 (E^) 1 / 2 G 7V (z)(E^)- 1 / 2 . 

From Theorem 8.2, we know that Gtv(z) is a stable transfer matrix (all poles restricted to |z| < 1). 
This shows that Btv(z)A^ (z) is stable as well. Because Btv(z)A^ (z) is an irreducible MFD 
(Lemma 8.6), all the zeros of [detA^z)] are poles of B7v(z)A w (z) (Lemma 8.4). So all zeros of 
[detATv(z)] are in |z| < 1. □ 
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Refer again to Eq. (8.90), which is reproduced below: 

z-'G^z)^ [E^-^B^z) A^topEjfl 1 / 2 = B( Z )A" 1 ( Z ) (8.96) 

v " v ' 

call this B (z) call this A - ' (z) 
Because Gtv(z) is paraunitary, it follows that B(z)A (z) is paraunitary as well. Thus, 

A _1 (z)B(z)B(z)A-'(z) = 1 



from which it follows that 



That is, 



B(z)B(z) = A(z)A(z) forallz. (8.97) 



B^z) [E h N ] "^(z) = A w (z) [E$ "^(z) . (8. 98) 

So, the forward and backward predictor polynomials are related by the above equation. For the 
scalar case, note that we had £ N = £^, so that this reduces to Bn(z)Bn(z) = An(z)An(z), which, 
of course, is equivalent to the statement that Bn(z)/An(z) is all-pass. Summarizing, the main 
properties of the optimal predictor polynomials for vector LPC are the following: 

1. Atv(z) and Btv(z) are right coprime, that is, J$n(z)A n (z) is irreducible (Lemma 8.6). 

2. Atv(z) has minimum-phase property (Theorem 8.3). 

3. Btv(z) [E^] _1 Bjv(2;) = A N (z) [Efy "^(z), or equivalently, 

z- 1 G 7V (z)^[E^]- 1 /2B 7V (z)A 7V 1 (z)[E^] 1 /2 ( 8 . 99 ) 

is paraunitary. 

8.14 CONCLUDING REMARKS 

Although the vector LPC theory proceeds along the same lines as does the scalar LPC theory, there 
are a number of important differences as shown in this chapter. The scalar all-pass lattice structure 
introduced in Section 4.3 has applications in the design of robust digital filter structures (Gray 
and Markel, 1973, 1975, 1980; Vaidyanathan and Mitra, 1984, 1985; Vaidyanathan et al., 1986). 
Similarly, the paraunitary lattice structure for MIMO linear prediction can be used for the design of 
low sensitivity digital filter structures, as shown in the pioneering work of Rao and Kailath (1984). 
The MIMO lattice is also studied by Vaidyanathan and Mitra (1985). 
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APPENDIX A 

Linear Estimation of Random 
Variables 



Let X\, x%, ■ ■ ■ , Xn be a set of N random variables, possibly complex. Let x be another random 
variable, possibly correlated to Xi in some way. We wish to obtain an estimate of x from the observed 
values of #,-, using a linear combination of the form 

x = —a\x\ — 02 %2 ' ' ' _ a N x N- \A--L) 

The use of —a* rather than just «; might appear to be an unnecessary complication, but will become 
notationally convenient. We say that x is a linear estimate of x in terms of the "observed" variables 
X- v The estimation error is defined to be 

e = x — x, (A.2) 

so that 

e = x + a\ X\ + #2 x 2 + • • • + cin x n- (d-3) 

A classic problem in estimation theory is the identification of the constants a,- such that the mean 
square error 

£=E[\e\\ (A.4) 

is minimized. The estimate x'v* then said to be the minimum mean square error (MMSE) linear 
estimate. 

A.1 THE ORTHOGONALITY PRINCIPLE 

The minimization of mean square error is achieved with the help of the following fundamental 
result: 

TheoremA.l. Orthogonality Principle. The estimate x in Eq. (A.l) results in minimum mean 
square error among all linear estimates, if and only if the estimation error e is orthogonal to the N 
random variables x;, that is, E[ex* ] = for 1 < i < N. <0> 
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Proof. Let x± denote the estimate obtained when the orthogonality condition is satisfied. 
Thus, the error 

e±=x — x± (A. 5) 

satisfies 

E[e ± xf] = 0, 1 < i < N. {A. 6) 

Let x be some other linear estimate, with error e. Then, 

e = x — 'x = e± -\- xj_ — x (from Eq. (A.5)). 

This has the mean square value 

E[ \e\ 2 ] =E[ \e ± \ 2 ] + E[ \x± -x\ 2 ] + E[(x± -x)el] + E[e±(x* -**)]. {A.7) 

Because x± and x are both linear combinations as in Eq. (A.l), we can write 

N 

'xj_ —'x=y C{X{. [A. 8) 

We therefore have 

TV 

E[e ± (xl - x*)} = E[e± ^ cfxf } (from Eq. (A.8)) 
(=1 

TV 

(=1 

= (from Eq. (A.6)), 

so that Eq. (A.7) becomes 

E[ \e\ 2 } =E[ \e ± \ 2 } + E[ \x± -x\ 2 }. 
The second term on the right-hand side above is nonnegative, and we conclude that 

E[\e\ 2 ]>E[\e ± \ 2 }. 
Equality holds in the above, if and only if ~xj_ =x. □ 

A.2 CLOSED-FORM SOLUTION 

If we impose the orthogonality condition on the error (A.3), we obtain the TV equations 

E[ex*\ =0, 1 < i < N (orthogonality). {A. 9) 
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We use these to find the coefficients a- u which result in the optimal estimate. (The subscript _L 
which was used in the above proof will be discontinued for simplicity.) Defining the column vectors 

x = [x 1 x 2 . . . x N ] , a = [a\ a 2 . . . a N ] , 



we can write the error e as 

e = x + a'x, 
and the orthogonality condition Eq. (A.9) as 

E[e^) = 0. 
Substituting from Eq. (A.10), we arrive at 

E[xx*} + a t £[xx t ] =0. 
In the above equation, the matrix 

R=£[xx f ] 



(A.10) 

(A. 11) 
(A. 12) 



,t 



is the correlation matrix for the random vector x. This matrix is Hermitian, that is, R = R. 
Defining the vector 

t = E[xx% 

Eq. (A. 12) simplifies to 

Ra = -r. 

More explicitly, in terms of the matrix elements, this can be written as 



E[ \xi\ 2 ] E[xiX2] ■ ■ ■ E[xiXn] 
E[x 2 x*] E[ \x 2 \ 2 ] ■ ■ ■ E[x 2 x%\ 

E[x N x*\ E[x N x 2 ] . . . E[ \xn\ ] 





«1 




i?[xi^*] 




«2 


= — 


i?[x2^*] 




a N 




E[xfjx*] 



(A.13) 



R a _ r 

The vector r is the cross-correlation vector. Its z'th element represents the correlation between the 
random variable x and the observations #,-. If the correlation matrix R is nonsingular, we can solve 
for a and obtain 



a = — R r (optimal linear estimator). 



(A.14) 
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The autocorrelation matrix R is positive semidefinite because xx' is positive semidefinite. If in 
addition R is nonsingular, then it is positive definite. The case of singular R will be discussed in 
Section A.4. 

The above development does not assume that x or Xi have zero mean. Notice that if the 
correlation between x and the observation X{ is zero for each i, then r = 0. Under this condition, 
a = 0. This means that the best estimate is x = and the error is e = x itself! 

A.3 CONSEQUENCES OF ORTHOGONALITY 

Because the estimates is a linear combination of the random variables x,-, the orthogonality property 
(A.9) implies 

E[ex*} = 0. {A. 15) 

That is, the optimal linear estimate 'x is itself orthogonal to the error e. The random variable x and 
its estimate x are related as x = x + e. From this, we obtain 

E[ \x\ 2 ) =E[ |x| 2 ] + E[ \e\ 2 } {A. 16) 

£ 

by exploiting Eq. (A.15). 

Geometric Interpretation. A geometric interpretation of orthogonality principle can be given 
by visualizing the random variables as vectors. Imagine we have a three-dimensional Euclidean 
vector x, which should be approximated in the two-dimensional plane \x\, x%) (Fig. A.l). We see 
that the approximation error is given by the vector e and that the length of this vector is minimized 
if and only if it is perpendicular to the X\, X2 plane. Under this condition, the lengths of the vector 
x, the error e, and the estimate 'x are indeed related as 

(length ofx) 2 = (length of x) 2 + (length of e) 2 , 

which is similar to Eq. (A.16). The above relation is also reminiscent of the Pythagorean 
theorem. □ 

Expression for the Minimized Error. Assuming that the coefficients a t have been found, we 
can compute the minimized mean square error as follows: 

£ = E[\e\ 2 } = E[ee*} = E[e(x* + x f a)] 

= E[ex*] (by orthogonality (A.ll)). 
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FIGURE A.l: Pertains to the discussion of orthogonality principle. 

Using Eq. (A.10) and the definition of r, this can be rewritten as 

E[\e\ 2 } =E[ \x\ 2 } + a f r 



{A. 17) 



Example A.l: Estimating a Random Variable. Let x and x\ be real random variables. We 
wish to estimate x from the observed values of x\ using the linear estimate 

x = —a\X\. 

The normal equations are obtained by setting A^ = 1 in Eq. (A.13), that is, 

E[x r ] x a\ = — E[xix]. 

Thus, 

-E\x\x\ 



«l 



E[x\] 



The mean square value of the estimation error is 



E[e 2 } = E[x 2 } 



(E^x]) 2 



[fromEq. (A. 17)]. 



Thus, a\ is proportional to the cross-correlation _E[a:ia:] . If iJ^x] = 0, then a\ = 0, that is, the best 
estimate is x = 0, and the mean square error is 

E[e ] = E[x ] = mean square value of x itself! 
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A.4 SINGULARITY OF THE AUTOCORRELATION MATRIX 

If the autocorrelation matrix R defined in Eq. (A.13) is singular, (i.e., [detR] = 0), we cannot 
invert it to obtain a unique solution a as in Eq. (A. 14). To analyze the meaning of singularity, recall 
(Horn and Johnson, 1985) that R is singular if and only if Rv = for some vector v/ 0. Thus, 
singularity of R implies v* Rv = 0, that is, 

v t J B[xx t ]v=0, v/0. 

Because v is a constant with respect to the expectation operation, we can rewrite this as E[v'xx.' v] = 
0. That is, 

E\ Iv^l 2 ! = 0. 



This implies that the scalar random variable v' x is zero, that is, 

V*X\ + ^2*2 + • • • + V$jX N 



v'x 



0. 



T 



T 



with x = x\ #2 • • • x n an d v = v\ v% . . . vn ■ Because v 7^ 0, at least one of components 
vi is nonzero. In other words, the N random variables x- t are linearly dependent. Assuming, for 
example, that v^ is nonzero, we can write 



x N 



[v*xi + ... +vn-\x n _i]/vn- 



That is, we can drop the term a%Xfq from Eq. (A.3) and solve a smaller estimation problem, without 
changing the value of the optimal estimate. We can continue to eliminate the linear dependence 
among random variables in this manner until we finally arrive at a smaller set of random variables, 
say, Xi, 1 < 1 : < L, which do not have linear dependence. The L x L correlation matrix of these 
random variables is then nonsingular, and we can solve a set of L equations (normal equations) 
similar to Eq. (A.13), to find the unique set of L coefficients «;. 

Example A.2: Singularity of Autocorrelation. Consider the estimation of x from two real 
random variables x\ and xj- Let the 2x2 autocorrelation matrix be 



R 



1 a 



We have [det R] 



so that R is singular. We see that the vector v 



has the 



property that Rv = 0. From the above theory, we conclude that x\ and xi are linearly dependent 
and ax\ — X2 = 0, that is, 

X2 = ax\. 

In other words, xj is completely determined by X\. 
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APPENDIX B 

Proof of a Property o 
Autocorrelations 




(B.l) 
(B.2) 



The autocorrelation R(i) of a WSS process satisfies the inequality 

*(o)>TO|. 

To prove this, first observe that 

E\x(0) -e j4, x(-k)\ 2 >0 

for any real constant <f>. The left-hand side can be rewritten as 

E [|*(0)| 2 + \x(-k)\ 2 - e J ' l 'x*(0)x(-k) - e- ;0 *(O)**(-^)] 
= 2R(0) - e^R^k) - t-^R^k). 

Because this holds for any real <f>, let us choose <fi so that R* (k)e Jl ^ is real and nonnegative, that is, 
R*(k)t# = \R(k)\.Thus, 

E\x{0) - t J,p x{-k)\ 2 = 2R{0) - 2\R(k)\ > 0, 

from (B.2). This proves Eq. (B.l). Equality in (B.l) is possible if and only if 

E\x(0) -t^xi-k)] 2 = 0, 

that is, 

E\x{n) -e j4> x(n-k)\ 2 = 

for all n (by WSS property). Thus, 

x{n) = t J x(n — k), 

which means that all samples of x(n) can be predicted perfectly if we know the first k samples 
x(0), . . . , x{k — 1). Summarizing, as long as the process is not perfectly predictable, we have 

R(0) > \R(k)\ (B.3) 

for k^O. 
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APPENDIX C 

Stability of the Inverse Filter 



We now give a direct proof that the optimum prediction polynomial A^{z) has all zeros p, 
inside the unit circle, that is, \p,\ < 1, as long as x(n) is not a line spectral process (i.e., not fully 
predictable). This gives a direct proof (without appealing to Levinson's recursion) that the causal 
IIR filter \/An(z) is stable. This proof appeared in Vaidyanathan et al. (1997). Also see Lang and 
McClellan (1979). 

Proof. The prediction filter reproduced in Fig. C.l(a) can be redrawn as in Fig. C.l(b), 
where q is some zero of A^yz) and Cjv-i(z) is causal FIR with order N — 1. First observe that 
1 — qz is the optimum first-orderprediction polynomial fortheWSS process_y(»), for otherwise, 
the mean square value of its output can be made smaller by using a different q, which contradicts 
the fact thaty^v(z) is optimal. Thus, q is the optimal coefficient 



RyyiX) 
R yy {Q) 



(CI) 



where Ryy(k) is the autocorrelation of the WSS process y(n). From Appendix B, we know that 
Ryy{Q) ^ \R yy (k)\ for any k th.a.t shows 



M<i- 



This proves that all zeros of An(z) satisy \zk\ < 1. To show that, in fact, \q\ < 1 when x{n) 
is not a line spectral process, we have to work a little harder. Recall that the optimal e N {n) is 
orthogonal to x{n — 1), ... , x{n — N) (Section 2.3). Because y{n — 1) is a linear combination of 
x{n — 1), . . .,x{n — N), it is orthogonal to e N {n) as well: 



E[e f N {n)f{n - 1)] = 0. (C.2) 
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(a) x(n) ► 



A N (z) 

FIR prediction filter 



"► e f N (") 



x(n)- 



(b) 



C N-l( z ) 



y(n) 



1 -qz- 1 



-*ef(n) 



FIGURE C.l: (a) The prediction filter and (b) an equivalent drawing, with a factor (1 — gz 1 ) shown 
separately. 



r 

Thus, using the fact that e N {n) = y{n) — qy{n — 1), we can write 

E f N = E[\e f N {n)\ 2 } = E[e f N {n)(y{n) - gy(n - 1))*] 
= E[e f N {n)f{n)} (from (C.2)) 

= E \{y( n ) - gy( n - !))/(«)] 

= R yy (0) - gR; y (l) 

= R yy (0)(l - \g\ 2 ) (from (C.l)). 

Because x(n) is not a line spectral process, we have £^ ^ 0, that is, £jy > 0. Thus, (1 — \q\ ) > 0, 
which proves \q\ < 1 indeed. □ 
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APPENDIX D 




Recursion Satisfied by AR 
Autocorrelations 



Let a\ n denote the optimal iVth-order prediction coefficients and Ejy the corresponding prediction 
error for a WSS process x(n) with autocorrelation R(k). Then, the autocorrelation satisfies the 
following recursive equations: 

R(k) = { -«n,\R(* - 1) - «fa*(* - 2) ... - a* N , N R{k - N) + £& k = 
" | -«jv,i#0* - 1) - «iv.2^ - 2) ... - «&yU(* -iV), 1 < k < N. 

In particular, if the process x(n) is AR(A^), then the second equation can be made stronger: 

R(k) = -a$f tX R{k - 1) - a^ 2 R(k - 2) ... - a^ N R{k -N), for all k > (D.2) 

Proof. The above equations are nothing but the augmented normal equations rewritten. To 
see this, note that the Oth row in Eq. (2.20) is precisely the k = case in (D.l). For 1 < k < N, 
the Mi equation in (2.20) is 

R*(k) + a NA R*(k - 1) + . . . + a N>N R*{k - N) = (using R(i) = R*(-i)). 

Conjugating both sides, we get the second equation in Eq. (D.l). For the case of the AR(iV) 
process, we know that the (N+ w)th-order optimal predictor for any m > is the same as the 
iVth-order optimal predictor, that is, 

J aN,n 1 < n < N 
n > N. 

So, the second equation in Eq. (D.l), written foriV+ m instead of TV yields 

R(k) = -a* m R(k - 1) - a* N ,iR{k - 2) ... - a% >N R(k -N), 1 <k<N+m 

for any m > 0. This proves Eq. (D.2) for AR(iV) processes. D 
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Problems 




1. Suppose x{n) is a zero-mean white WSS random process with variance a . What are the 
coefficients a/v,- of the N th-order optimal predictor? What is the prediction error variance? 

2. Consider the linear prediction problem for a real WSS process x{n) with autocorrelation 
R(k). The prediction coefficients tf/vj are now real. Show that the mean square prediction 



error £j^ can be written as 



where 



£^ = J R(0)+a T R 7V a + 2a T r, 



a N,l a N,2 ■ ■ ■ a N,N 



•A 



and Rtv and r are as in Eq. (2.8). By differentiating ££ f with respect to each a^ ,- and setting 
it to zero, show that we again obtain the normal equations (derived in Section 2.3 using 
orthogonality principle). 



3. Consider a WSS process x(n) with autocorrelation 

f b os\k\ 



R(k) 







for k even 
for k odd 



(P.3) 



where < b < 1. 

a) Compute the power spectral density S xx (e JUJ ). 

b) Find the predictor polynomials Ai{z),A2{z), the lattice coefficients k\, h-2, and the error 
variances £■[, £ 2 using Levinson's recursion. 

c) Show that e£ (n) is white. 

4. Someone performs optimal linear prediction on a WSS process x{n) and finds that the first 
two lattice coefficients are k\ = 0.5 and k 2 = 0.25 and that the second-order mean square 
prediction error is £ 2 = 1-0. Using these values, compute the following. 



a) The predictor polynomials y^i(z) and^C 2 )- 

b) The mean square errors £„ and £/. 

c) The autocorrelation coefficients R(m) of the process x[n), for m 



0,1,2. 



164 THE THEORY OF LINEAR PREDICTION 

5. Someone performs optimal linear prediction on a WSS process x{n) and finds that the first 
four lattice coefficients are 

k x = 0, k 2 = 0.5, h = 0, k A = 0.25, (P.5) 

and that the fourth-order mean square prediction error is & A = 2.0. Using these values, 
compute the following: 

a) The predictor polynomials A m (z) form = 1,2,3,4. 

b) The mean square errors Em for m = 0,1,2,3. 

c) The autocorrelation coefficients R(m) of the process x(n), for m = 0,1,2,3,4. 

6. Show that the quantity a m arising in Levinson's recursion can be expressed in terms of a 
cross-correlation, that is, 

c& = E[e£(n)x*{n - m - 1)]. (P.6) 

7. Estimation From Noisy Data. Let v be a noisy measurement of a random variable u, that is, 

v = u + e, (P. 7a) 

where the random variable e represents noise . Assume that this noise has zero mean and 
that it is uncorrected with u. From this noisy measurement v, we would like to find an 
estimate of u of the form a = av. Let e = u — 'u be the estimation error. Compute the value 
of a, which minimizes the mean square error E\\e\ ]. Hence, show that the best estimate of 
u is 



u 



R„ 



Ruu + O" 



v, {P.7b) 



where R uu = E\\u\ 2 ] and a\ = -E[|e| 2 ] (noise variance). Show also that the minimized mean 
square error is 

E[\e\ 2 ] = -^- 2 . (P7c) 

Thus, when the noise variance a \ is very large, the best estimate approaches zero; this means 
that we should not trust the measurement and just take the estimate to be zero. In this 
extreme case, the error -E[|e| ] approaches R uu . Thus, with the above estimate, the mean 
square estimation error will never exceed the mean square value R ml of the random variable 
a to be estimated, no matter how strong the noise is! 
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8. Valid Autocorrelations. Let R be a Hermitian positive definite matrix. Show that it is a 
valid autocorrelation matrix. That is, show that there exists a random vector x such that 

R = £[xx t ]. 

9. Let J denote the N X N antidiagonal matrix, demonstrated below for N = 3. 

(P. 9) 



001 
010 
100 



If A is any N X N matrix, then JA represents a matrix whose rows are identical to those of 
A but renumbered in reverse order. Similarly, AJ corresponds to column reversal. 

a) Now assume that A is Hermitian and Toeplitz. Show that JAJ = A . 

b) Conversely, suppose JAJ = A . Does it mean that A is Hermitian and Toeplitz? Justify. 

c) Suppose JAJ = A and A is Hermitian. Does it mean that A is Toeplitz? Justify. 

10. Consider a WSS process x{n) with power spectrum 

Me*") = r- \~f! - K P < 1. (P.10) 

1 + p l — 2p cos u) 

Let R2 represent the 2x2 autocorrelation matrix of x(n) as usual. Compute the eigenvalues 
of this matrix and verify that they are bounded by the extrema of the power spectrum (i.e., 
as in Eq. (2.28)). 

11. Condition number and spectral dynamic range. Eq. (2.31) shows that if the power spectrum 
has wide variations (large ^max/Smin), then the condition number of R^v can, in general, be 
very large. However, this does not necessarily always happen: we can create examples where 
£max/omin is arbitrarily large, and yet the condition number of R^y is small. For example, 
given the autocorrelation R(k) with Fourier transform S xx (e JU '), define a new sequence as 
follows: 

J R(k/N) if k is multiple ofN 
1 otherwise. 

a) Find the Fourier transform S yy (e JUJ ) of R yy {k), in terms of S xx (e JUJ ). 

b) Argue that R yy {k) represents a valid autocorrelation for a WSS process y(n). 

c) For the process y(n), write down the N x N autocorrelation matrix. What is its condition 
number? 
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d) For the process x(n), let 7 denote the ratio S max / S mm in its power spectrum S xx (e J "). 
For the process y(n), what is the corresponding ratio? 

12. Consider two Hermitian matrices A and B such that A is the upper left submatrix of B, 
that is 



B 



A x 

X X 



{P. 12) 



a) Show that the minimum eigenvalue of B cannot be larger than that of A. Similarly, 
show that the maximum eigenvalue of B cannot be smaller than that of A. Hint. Use 
Rayleigh's principle (Horn and Johnson, 1985). 

b) Prove that the condition number J\f of the autocorrelation matrix R m of a WSS process 
(Section 2.4.1) cannot decrease as the matrix size m grows. 

13. Consider a WSS process x(n) with the first three autocorrelation coefficients 

R(0) = 3, R(l) = 2, R{2) = 1. (P.13) 

a) Compute the coefficients of the first and the second-order optimal predictor polynomials 
A\\z) and A2 (z) by directly solving the normal equations. 

b) By using Levinson's recursion, compute the predictor polynomials and mean square 
prediction errors for the first- and second-order predictors. 

c) Draw the second-order FIR LPC lattice structure representing the optimal predictor and 
indicate the forward and backward prediction errors at the appropriate nodes. Indicate 
the values of the lattice coefficients k\ and ki- 

d) Draw the second-order IIR lattice structure representing the optimal predictor and 
indicate the forward and backward prediction errors at the appropriate nodes. Indicate 
the values of the lattice coefficients k\ and kj. 

14. Valid Toeplitz autocorrelation. Consider a set of numbers R(k),0 < k < N and define 
the matrix R^v+i as in Eq. (2.20). By construction, this matrix is Hermitian and Toeplitz. 
Suppose, in addition, it turns out that it is also positive definite. Then, show that the numbers 
R{k)> < ^ < iV, are valid autocorrelation coefficients of some WSS process x(n). Give a 
constructive proof, that is, show how you would generate a WSS process x(n) such that its 
first iV + 1 autocorrelation coefficients would be the elements R(0), -K(l), • • • , R(N). 

15. Show that the coefficients b^ -j of the optimal backward predictor are indeed given by 
reversing and conjugating the coefficients of the forward predictor, as shown in Eq. (4.3). 
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16. Let e m (n) denote the forward prediction error for the optimal wth-order predictor as usual. 
Show that the following orthogonality condition is satisfied: 



J(n)[e{(n-1)Y 



0, m > 



(P. 16) 



17. In Problem 4, suppose you are informed that x(n) is actually AR(2). Then, compute the 
autocorrelation R(m) form = 3,4. 

18. In Example 2.1, the autocorrelation R(k) of a WSS process x(n) was supplied in closed 
form. Show that the process x(n) is in fact AR(2), that is, the prediction error el in) is 
white. 

19. Compute and sketch the entropy of a Gaussian WSS process, with autocorrelation R(k) = 
p' ', — 1 < p < 1. Make qualitative remarks explaining the nature of the plot. 

20. Consider a Gaussian AR(2) process with R(0) = 1, for which the second-order predictor 
polynomial is 



Mz) 



1--Z' 

2 



1- -2f 

3 



(P. 20) 



Compute the entropy of the process and the flatness measure 7^. 



21. Consider a real WSS Gaussian process x{n) with power spectrum sketched as shown in 
Fig. P.21. 
a) Compute the entropy of the process. 



S xx (e ) 



— I 1 

nlA nil 

FIGURE P.21 
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b) Compute the limiting value £so of the mean square prediction error. 

c) Compute the flatness measure 7^. 

22. Consider a WSS process x(n) with autocorrelation R(k). Let the 3x3 autocorrelation 
matrix have the form 



R, 



1 


P 


a 


p 


1 


P 


a 


P 


1 



(P. 22) 



where p and a are real and — 1 < p < 1. (Thus, R(0) = 1, R(l) = p, and R(2) = a.) If 
you are told that x(k) is a line spectral process of degree = 2, what would be the permissible 
values of a? For each of these permissible values of a, do the following: 

a) Find the line frequencies U0\ and uJj and the powers at these line frequencies. 

b) Find the recursive difference equation satisfied by R(k). 

c) Find a closed form expression for R(i). 

d) Compute R(3) andi?(4). 

23. We now provide a direct, simpler, proof of the result in Theorem 7.4. Let y(n) be a WSS 
random process such that if it is input to the FIR filter 1 — az~ the output is identically 
zero (see Fig. P.23) 

a) Show that the constant a must have unit magnitude. (Assume, of course, that y{n) itself 
is not identically zero.) 

b) Now let x{n) be a WSS process satisfying the difference equation 



x{n) 






*x{n — i). 



(P.23) 



That is, if x(n) is input to the FIR filter A(z) = 1 + ^<=i a * z ~'> tri e output is identically 
zero. Furthermore, let there be no such FIR filter of lower order. Show then that all the 
zeros of A (z) must be on the unit circle. 



y(n) 



1 -az 



for all n 



FIGURE P.23 
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24. Let R m be the m X m autocorrelation matrix of a Linespec(-L) process x(n). We know that 
R/, has rank L and Rl+i is singular, with rank L. Using (some or all of) the facts that R,„ is 
Hermitian, Toeplitz, and positive semidefinite for any m, show the following. 

a) R m has rank m for all m < L. 

b) R m has rank exactly L for any m > L. 

25. Show that the sum of sinusoids considered in Example 7.3 is indeed WSS under the three 
conditions stated in the example. 

26. Suppose a sequence of numbers 

a ,a-i,a 2 , • • • (P. 26a) 

tends to a limit a. This means that, given e > 0, there exits a finite integer N, such that 

\cti -a\<e for i > N. (P. 26b) 

a) Show that 

M-\ 



im — > a.-, = a. (P. 26c) 



lim 

M-^oo M 

1=0 



,M-1 



In other words, define a sequence (3m — \\/M) ^ != q a i an< i show that it tends to a. 
b) Suppose f{x) is a continuous function for all x in a given range 1Z. Thus, for any fixed x 
in the range, if we are given a number e > 0, we can find a number 5 > such that 

|/(*l) -/(*)| < e for|x! -x| < (5. (P.26d) 

Let all members a z - of the sequence (P.26a) be in the range 1Z. Show then that 

lim/(a,-) =f( lim a,) =/(«)■ (P.26s) 

z — >oo Vz — >oo / 

27. Let /?i(^J and R-2(k) represent the autocorrelations of two WSS random processes. Which 
of the following sequences represent valid autocorrelations? 

a) R 1 (k)+R 2 (k) 

b) Ri(k)-R 2 (k) 

c) Ri(k)R 2 (k) 

d) j: n R 1 (n)R 2 (k-n) 

e) ^(2^) 
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f) R(k) 



R\{k/2) for k even 
for k odd. 

28. Leti?(^) be the autocorrelation of azero mean WSS process and suppose R[k\) = Rikx) 7^0 
for some k.2~> k\~> 0. Does this mean that R(i) is periodic? Note. If k\ =0 and ^2 > 0> 
then the answer is yes, as asserted by Theorem 7.2. 

29. Find an example of the autocorrelation R(k) for a WSS process such that, for some m, 
the autocorrelation matrices R m and R m +i have the same minimum eigenvalue, but the 
minimum eigenvalue of R m +2 is smaller. 

30. Let xandy be real random variables such that E\xy\ = £[#]2?|V], that is, they are uncorrelated. 
This does not, in general, imply that ^[x^y 7 ] = i?[x' c ']£'[)' n ] (i.e., that x k and y" are 
uncorrelated) for arbitrary positive integers k, n. Prove this statement with an example. 

31. It is well-known that the power spectrum S xx (e JUJ ) of a scalar WSS process is nonnegative 
(Papoulis, 1965; Peebles, 1987). Using this, show that the power spectral matrix S xx (e- /W ) 
of a WSS vector process x(») is positive semidefinite. 

32. Let x(n) andjy(w) be zero mean WSS processes, with power spectra ^(e 7 ") and S yy (e.-' UJ ). 
Suppose S xx (e, JU) )Syy{e, JU) ) = for all Ui, that is, these spectra do not overlap as demonstrated 
in the figure. 

a) This does not mean that the processes are necessarily uncorrelated. Demonstrate this. 

b) Assume, however, that the above processes are not only WSS but also jointly WSS (i.e., 
E[x(n)y*(n — k)] does not depend on «). Show then that they are indeed uncorrelated. 



s^n 



Syy(e ia ) 




CO 



2k 



FIGURE P.32 
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